Using clojure.spec to declaratively parse query parameters

12 September 2016

This post will introduce you to a new clojure core library that will be part of the Clojure 1.9 release called clojure.spec. After a short introduction it will go into detail on how we use it for some non-trivial applications like specifying, validating, and handling HTTP URL query parameters.

What is `clojure.spec`?

clojure.spec is a new core library for validation of dynamic data structures. Clojure, being a dynamic language, is usually quite lightweight on explicit domain models – some people use records to model their domain, while others favor simple maps without any explicit structure.

While this accelerates development – especially in the early phase of implementation – it's often necessary to validate data more thoroughly as the code base grows. While Clojure itself offers predicates for simple cases like string?, integer?, vector?, etc., it's usually harder to validate more complex structures. Until now, most of our codebases included only the occasional (assert (valid-foo? some-value)), while real validation was limited to the those parts of the system that read data from or write data to the outside world.

There are various other libraries for validation of complex data, most notably plumatic/schema, but we found that they usually left something to be desired, especially when used as a foundation for more complex features. Most aren't ﬂexible and/or "hackable" enough.

So what's special about clojure.spec? The most notable difference is that it uses a unique feature of Clojure: namespaced keywords.

The most common use case for keywords in Clojure is as a key to a value in a map: (get user :name) (with (def user {:name "Rich Hickey"}) returns the string "Rich Hickey". Much less used is the optional namespace part of a keyword. Think of it as a specially handled prefix.

(One thing I won't go into here is the 'auto-prefix' feature, which Meikel Brandmeyer described in detail already.)

So how does clojure.spec take advantage of this feature? It allows you to define a validation function for any keyword! Of course, having global scope, these definitions usually don't make much sense for non-namespaced keywords. Who knows if :name refers to the name of a user or the name of a street? The solution in Clojure is to make them unambiguous via a prefix: :street/name and :person/name. This is the same approach Datomic uses for its attributes.

Simple use cases

Basic validation looks like the following:

(s/def :person/name string?)
(s/def :album/year  int?)

(s/valid? :person/name "Rich Hickey")
;;=> true
(s/valid? :album/year "1977")
;;=> false, :album/year is a string
(s/valid? :album/year  1977)
;;=> true

More complex validations are also supported: A vector of album release years can be validated like this (note that we are re-using an existing predicate for :album/year):

(s/valid? (s/coll-of :album/year) [12345 53563]) ;=> true

Maps take a special place, which makes clojure.spec quite unique: They are defined as a set of keys, with all keys being specs for themselves:

(s/def :person/name string?)
(s/def :person/age  (s/and integer? pos?))
(s/def :address/street string?)

(s/def :model/user (s/keys :req [:person/name
                                 :person/age]
                           :opt [:address/street]))

This defines a new spec called :model/user with two required and one optional key. It's used like this

(s/valid? :model/user
          {:person/name "Rich Hickey"
           :person/age 23                    ; just kidding
           :address/street "Paren Boulevard" ; a guess
           })
;=> true

(s/valid? :model/user
          {:person/name "Rich Hickey"
           :person/age 23})
;=> true

(s/valid? :model/user
          {:person/name "Rich Hickey"
           :person/age -23})
;=> false, :person/age is negative

These are just the most basic features of clojure.spec. Other features include automatic generation of data that passes a given spec (very useful for testing), coercion of data from one type to another, and generation of reusable error-messages and error-data.

For a detailed guide into most of these features, take a look at the Spec Guide.

Thinking further

One of our major tasks is providing HTTP API endpoints to our customers. Data comes in, gets parsed, validated, processed and a response is sent to the client.

We use Liberator for our endpoints, which allows us to declare them like this:

(defresource submit-data
  :allowed-media-types ["application/json"]
  :allowed-methods #{:post}
  :processable? (fn [ctx]
                  ;; 1) Extract client-data
                  ;; 2) Validate
                  ;; 3) Generate error message if invalid
                  ))

Our APIs are relatively data-heavy: Some endpoints have quite a few query parameters, others receive complex multipart/form-data or JSON bodies.

When parsing and validating these our endpoints often contain complicated ad-hoc logic in their :processable? decision which is neither reusable nor easy to understand. Liberator itself doesn't provide any utilities for this, nor do decision functions compose well.

We also wanted to generate API documentation from our code - without having to maintain separate ﬁles for documentation that need to be kept up-to-date with the logic used to implement :processable?.

What we needed was for an endpoint to specify its parameter names, valid input values, and transformation functions in a declarative manner. As it turns out, clojure.spec is a very good ﬁt for this.

Imagine we want to implement an endpoint that will take two query parameters: limit and offset. Both integer, both greater than or equal to zero. Without something like clojure.spec, the :processable? decision would look like this:

(fn [ctx]
  (let [request (:request ctx)
        {:strs [offset limit]} (:query-params request)
        offset* (string->long offset)
        limit*  (string->long limit)]
    (cond
      (nil? offset) [false "Missing parameter 'offset'"]
      (nil? limit)  [false "Missing parameter 'limit'"]

      (nil? offset*) [false "Couldn't parse value for parameter 'offset'"]
      (nil? limit*)  [false "Couldn't parse value for parameter 'limit'"]

      (neg? offset*) [false "Parameter 'offset' can't be negative "]
      (neg? limit*)  [false "Parameter 'limit' can't be negative "]

      true
      [true {::offset offset*
             ::limit limit*}])))

This will parse and validate just two parameters – and it doesn't do it very well as it won't collect all error messages, it simply returns the ﬁrst one. That means if we don't pass any parameters we still get only the error message for offset, and so on.

Now imagine doing this every other day for multiple parameters, with some being optional, while handling default values and arbitrary transformations. It gets messy pretty fast, and don't even think about generating API documentation from this imperative blob.

With our goal of being declarative (and thus being able to generate API documentation), we evaluated several options. Of course one can easily write a function that takes a map of parameter names and transformation/validation functions, but this would still couple validation, transformation, and (in foreseeable future) documentation. Additionally, these maps would have to contain data to generate useful error messages. This would get messy pretty fast.

Enter `clojure.spec`

When Rich Hickey announced clojure.spec, many people took a dislike in the nature of the library: All defined specs are stored in a global variable and are accessible from anywhere in the program. This could easily cause name clashes as hinted at above for :name. The beauty comes from clojure.spec forbidding non-namespaced keywords in its internal database, thus avoiding this issue completely.

Applied to our query parameter thoughts, a similar database seemed like a perfect ﬁt.

This database would contain all information mentioned above: internal and external parameter names, transformation and validation functions, additional documentation, and other things.

This still isn't particularly special – the real usefulness comes when this approach is combined with clojure.spec.

Our approach of a global database of parameter information is using the same identifiers (namespace-prefixed keywords) as clojure.spec. It allows us to define an endpoint similar to the one above like this:

(require '[clojure.spec :as s])

(s/def ::limit  integer?)
(s/def ::offset integer?)

(defparam ::limit  "limit"  string->integer)
(defparam ::offset "offset" string->integer)

(defresource submit-data
  :allowed-media-types ["application/json"]
  :allowed-methods #{:post}
  :query-params {:req [::limit
                       ::offset]})
  ;; `::limit` and `::offset` are available via `ctx` in later
  ;; Liberator handlers/decisions, just as they would with our custom
  ;; implementation earlier

This code defines two new specs: One for ::limit and one for ::offset, both checking if the value is an integer.

defparam specifies the name of the query parameter for this identifier, as well as a transformation function that is applied before validation. Here we declare that the value of the parameter should be converted to an integer (query parameter values are always strings in Liberator/Ring).

A custom :processable? function automatically used in our own defresource wrapper interprets the :query-params map and handles all transformation, validation, etc. for us.

The structure of :query-params is loosely based on what clojure.spec/keys accepts, namely :req specifying a vector of required query parameters, and :opt doing the same for optional parameters.

In case of validation failures, clojure.spec/explain-data is used to generate error data which is then used to generate useful error messages for API clients. The data-driven approach also allows us to deliver these errors as JSON or HTML, for example, conveniently based on the Accept header sent by the client.

Additional Validation via `:spec`

The :query-params map allows us to validate query parameters indivudally, but sometimes there are inter-dependencies and you need to validate the parameters map as a whole. To that end, :query-params supports a :spec key.

The usage is quite simple:

;; `::location` is a map with both latitude and longitude and must fit
;; some predicate
(s/def ::location (s/and (s/keys :req [::latitude
                                       ::longitude])
                         ;; any other predicate
                         valid-location?))

(defresource submit-data
  :allowed-media-types ["application/json"]
  :allowed-methods #{:post}
  :query-params {:spec ::location
                 :req [::latitude
                       ::longitude]})

This will ﬁrst validate and extract ::latitude and ::longitude and afterwards conform them with the ::location spec. This will also generate useful error message, again via clojure.spec/explain-data.

Handling Dynamic Parameters

A special use case is dynamically specifying the parameter map. For example, Liberator doesn't have different decisions for different HTTP methods like GET and POST. :query-params also supports specifying a function taking a ctx argument, analogous to other Liberator decisions/handlers.

An endpoint that only accepts query parameters for GET can look like the following:

(defresource paginated-get
  :allowed-media-types ["application/json"]
  :allowed-methods #{:get :post}
  :query-params (fn [ctx]
                  (when (= :get (get-in ctx [:request :request-method]))
                    {:req [::limit
                           ::offset]})))

Of course the drawback of this approach is that we can't easily generate API documentation as we're getting into imperative territory again. Thankfully, this is only used in one place in our code base. We're still evaluating how we can make this more declarative.

This also shows a limitation of Liberator: There is no differentiation between different request-methods. The same :processable? function is used for GET, POST, etc. This forces users to write procedural code if they want different implementations. Another contender for HTTP endpoint handling is juxt/yada. Yada solves the different request methods issue rather beautifully by allowing different sets of parameters for different request-methods. Unfortunately, it's using plumatic/schema which we don't want to pull into our project.

Transformation

In addition to the third parameter of defparam (transform-fn), it's also possible to use clojure.spec/conformer to transform the input before or after validation.

One could argue that we could remove transform-fn and replace it with this approach, but the drawback is that it would couple the definition of a spec with how it gets read from a request, whereas the beauty of clojure.spec lies in its reusability. We don't want to couple our domain model's specs with how it is represented externally.

Compound Parameters

The last feature I want to talk about are compound parameters. Conceptionally, a compound parameter groups a set of parameters together so that either all or none of them have to be specified. Additionally, these parameters aren't assoc'd into ctx directly but as a separate map under the compound parameter's name.

Our pagination example above will look like this with compound paramters:

(s/def ::limit  integer?)
(s/def ::offset integer?)

(defparam ::limit  "limit"  string->double)
(defparam ::offset "offset" string->double)

;;; This `s/def` is optional - if our code finds a spec for a compound
;;; param it will use it for validation in addition to validating all
;;; "basic" parameters
(s/def ::pagination (s/keys :req [::limit ::offset]))
(def-compound-param ::pagination #{::limit ::offset})

(defresource submit-data
  :allowed-media-types ["application/json"]
  :allowed-methods #{:post}
  :query-params {:req [::pagination]})
  ;; `::limit` and `::offset` are available under `(::pagination ctx)`

While this seems very verbose at ﬁrst, it allows us to reuse much more parameters much easier. For example, one common parameter type in our code is ::location, which consists of three parameters (lat, lon, name). Without compound parameters, every endpoint would have to repeat these in its :query-params map - and we would have to edit every endpoint if we decide to add or remove something to the compound parameter.

Wrap Up

The combination of clojure.spec and our own code extending liberator goes a great length in improving and simplifying our code. Overall, we're quite happy with how everything turned out.

Our implementation with custom a global database also allows some very nice things like statically checking if all parameters in an endpoint are also declared via defparam, thus not failing at runtime.

This is even more important when doing it the other way around: When we convert our internal data structures to an external representation that is then sent to clients. This will be described in detail in another blog post.

However, there are some potential pitfalls:

Introducing a global database for query parameters comes with all the issues of global state: Defining parameters causes side effects, and the system you're interactively working on might not be the same as the system that gets loaded from the source code.

This is solved by reloading and restarting the entire system, as described by Stuart Sierra in My Clojure Workflow, Reloaded, for example.
Evaluation order matters: Being fond of compile-time assertions, we implemented defparam so it won't allow us to declare a not-yet-existing spec as a parameter. This contradicts how clojure.spec handles not existing other specs (they're resolved at runtime), but we want to catch errors as early as possible in development.

This means that evaluation order matters: You have to call clojure.spec/def before calling defparam. This might sound a bit strict, but in the end it wasn't much of an issue.

Future Ideas

Automatic Reflection via `clojure.spec`

One thing we were missing quite early when moving our codebase to clojure.spec and implementing presented features was reflection on arbitrary specs. It would have helped us a great deal in some situations (compound params, :spec validation) when clojure.spec could give us a list of required (:req) and optional (:opt) entries for a clojure.spec/keys spec. We wouldn't need def-compound-param (and possibly :spec) anymore.

Now this is a two-edged sword: While reflection would be very useful in some situations, there is no way to make it completely generic. What would the hypothetical function reflect-spec return for a complex composed spec like the following?

(s/or :case1 (s/keys :req [::foo])
      :case2 (s/map-of string? (s/or :i integer?
                                     :f float?)))

It is possible to use clojure.spec/form to get the original expression used to define a spec, but we decided against using this approach. While it would work quite ﬁne for simple specs as (clojure.spec/keys :req [::foo]), it quickly breaks down for others:

user> (s/form (s/map-of string? int?))
(clojure.spec/every
 (clojure.spec/tuple string? int?)
 :into
 {}
 :clojure.spec/kind-form
 clojure.core/map?
 :kind
 #function[clojure.core/map?--6182]
 :clojure.spec/kfn
 #function[user/eval69383/fn--69384]
 :clojure.spec/conform-all
 true)

Attaching Arbitrary Data to a Spec

Another thing that would simplify some parts of our implementation would be the possibility to attach arbitrary data to a spec, like in the following example:

(s/def ::offset pos-int?
  {::doc "A non-negative offset"
   ::query-param/name "offset"})

(s/extra-data ::offset)
;=> {::doc "An Integer", ::query-param/name "offset"}

This would allow us to remove our own databases and instead store everything via clojure.spec. Name-clashes could also be avoided by simply forbidding non-prefixed keys in the extra-data map.

We think that this would be a very useful feature that could open new possibilities for clojure.spec.