Using clojure.spec to declaratively parse query parameters
This post will introduce you to a new clojure core library that will
be part of the Clojure 1.9 release called clojure.spec
. After a
short introduction it will go into detail on how we use it for some
non-trivial applications like specifying, validating, and handling
HTTP URL query parameters.
What is clojure.spec
?
clojure.spec
is a new core library for validation of dynamic data
structures. Clojure, being a dynamic language, is usually quite
lightweight on explicit domain models – some people use records to
model their domain, while others favor simple maps without any
explicit structure.
While this accelerates development – especially in the early phase
of implementation – it's often necessary to validate data more
thoroughly as the code base grows. While Clojure itself offers
predicates for simple cases like string?
, integer?
, vector?
,
etc., it's usually harder to validate more complex structures.
Until now, most of our codebases included only the occasional
(assert (valid-foo? some-value))
, while real validation was
limited to the those parts of the system that read data from or
write data to the outside world.
There are various other libraries for validation of complex data, most notably plumatic/schema, but we found that they usually left something to be desired, especially when used as a foundation for more complex features. Most aren't flexible and/or "hackable" enough.
So what's special about clojure.spec
? The most notable
difference is that it uses a unique feature of Clojure:
namespaced keywords.
The most common use case for keywords in Clojure is as a key to a
value in a map: (get user :name)
(with (def user {:name "Rich
Hickey"})
returns the string "Rich Hickey"
. Much less used is
the optional namespace part of a keyword. Think of it as a
specially handled prefix.
(One thing I won't go into here is the 'auto-prefix' feature, which Meikel Brandmeyer described in detail already.)
So how does clojure.spec
take advantage of this feature? It
allows you to define a validation function for any keyword! Of
course, having global scope, these definitions usually don't make
much sense for non-namespaced keywords. Who knows if :name
refers
to the name of a user or the name of a street? The solution in
Clojure is to make them unambiguous via a prefix: :street/name
and :person/name
. This is the same approach Datomic uses for its
attributes.
Simple use cases
Basic validation looks like the following:
(s/def :person/name string?)
(s/def :album/year int?)
(s/valid? :person/name "Rich Hickey")
;;=> true
(s/valid? :album/year "1977")
;;=> false, :album/year is a string
(s/valid? :album/year 1977)
;;=> true
More complex validations are also supported: A vector of album
release years can be validated like this (note that we are re-using
an existing predicate for :album/year
):
(s/valid? (s/coll-of :album/year) [12345 53563]) ;=> true
Maps take a special place, which makes clojure.spec
quite unique:
They are defined as a set of keys, with all keys being specs for
themselves:
(s/def :person/name string?)
(s/def :person/age (s/and integer? pos?))
(s/def :address/street string?)
(s/def :model/user (s/keys :req [:person/name
:person/age]
:opt [:address/street]))
This defines a new spec called :model/user
with two required and
one optional key. It's used like this
(s/valid? :model/user
{:person/name "Rich Hickey"
:person/age 23 ; just kidding
:address/street "Paren Boulevard" ; a guess
})
;=> true
(s/valid? :model/user
{:person/name "Rich Hickey"
:person/age 23})
;=> true
(s/valid? :model/user
{:person/name "Rich Hickey"
:person/age -23})
;=> false, :person/age is negative
These are just the most basic features of clojure.spec
. Other
features include automatic generation of data that passes a given
spec (very useful for testing), coercion of data from one type to
another, and generation of reusable error-messages and error-data.
For a detailed guide into most of these features, take a look at the Spec Guide.
Thinking further
One of our major tasks is providing HTTP API endpoints to our customers. Data comes in, gets parsed, validated, processed and a response is sent to the client.
We use Liberator for our endpoints, which allows us to declare them like this:
(defresource submit-data
:allowed-media-types ["application/json"]
:allowed-methods #{:post}
:processable? (fn [ctx]
;; 1) Extract client-data
;; 2) Validate
;; 3) Generate error message if invalid
))
Our APIs are relatively data-heavy: Some endpoints have quite a few
query parameters, others receive complex multipart/form-data
or
JSON
bodies.
When parsing and validating these our endpoints often contain
complicated ad-hoc logic in their :processable?
decision which is
neither reusable nor easy to understand. Liberator itself doesn't
provide any utilities for this, nor do decision functions compose
well.
We also wanted to generate API documentation from our code -
without having to maintain separate files for documentation that
need to be kept up-to-date with the logic used to implement
:processable?
.
What we needed was for an endpoint to specify its parameter names,
valid input values, and transformation functions in a declarative
manner. As it turns out, clojure.spec
is a very good fit for
this.
Imagine we want to implement an endpoint that will take two
query parameters: limit
and offset
. Both integer, both greater
than or equal to zero. Without something like clojure.spec
, the
:processable?
decision would look like this:
(fn [ctx]
(let [request (:request ctx)
{:strs [offset limit]} (:query-params request)
offset* (string->long offset)
limit* (string->long limit)]
(cond
(nil? offset) [false "Missing parameter 'offset'"]
(nil? limit) [false "Missing parameter 'limit'"]
(nil? offset*) [false "Couldn't parse value for parameter 'offset'"]
(nil? limit*) [false "Couldn't parse value for parameter 'limit'"]
(neg? offset*) [false "Parameter 'offset' can't be negative "]
(neg? limit*) [false "Parameter 'limit' can't be negative "]
true
[true {::offset offset*
::limit limit*}])))
This will parse and validate just two parameters – and it doesn't
do it very well as it won't collect all error messages, it simply
returns the first one. That means if we don't pass any parameters
we still get only the error message for offset
, and so on.
Now imagine doing this every other day for multiple parameters, with some being optional, while handling default values and arbitrary transformations. It gets messy pretty fast, and don't even think about generating API documentation from this imperative blob.
With our goal of being declarative (and thus being able to generate API documentation), we evaluated several options. Of course one can easily write a function that takes a map of parameter names and transformation/validation functions, but this would still couple validation, transformation, and (in foreseeable future) documentation. Additionally, these maps would have to contain data to generate useful error messages. This would get messy pretty fast.
Enter clojure.spec
When Rich Hickey announced clojure.spec
, many people took a
dislike in the nature of the library: All defined specs are stored
in a global variable and are accessible from anywhere in the
program. This could easily cause name clashes as hinted at above
for :name
. The beauty comes from clojure.spec
forbidding
non-namespaced keywords in its internal database, thus avoiding
this issue completely.
Applied to our query parameter thoughts, a similar database seemed like a perfect fit.
This database would contain all information mentioned above: internal and external parameter names, transformation and validation functions, additional documentation, and other things.
This still isn't particularly special – the real usefulness comes
when this approach is combined with clojure.spec
.
Our approach of a global database of parameter information is using
the same identifiers (namespace-prefixed keywords) as
clojure.spec
. It allows us to define an endpoint similar to the
one above like this:
(require '[clojure.spec :as s])
(s/def ::limit integer?)
(s/def ::offset integer?)
(defparam ::limit "limit" string->integer)
(defparam ::offset "offset" string->integer)
(defresource submit-data
:allowed-media-types ["application/json"]
:allowed-methods #{:post}
:query-params {:req [::limit
::offset]})
;; `::limit` and `::offset` are available via `ctx` in later
;; Liberator handlers/decisions, just as they would with our custom
;; implementation earlier
This code defines two new specs: One for ::limit
and one for
::offset
, both checking if the value is an integer.
defparam
specifies the name of the query parameter for this
identifier, as well as a transformation function that is applied
before validation. Here we declare that the value of the parameter
should be converted to an integer (query parameter values are
always strings in Liberator/Ring).
A custom :processable?
function automatically used in our own
defresource
wrapper interprets the :query-params
map and
handles all transformation, validation, etc. for us.
The structure of :query-params
is loosely based on what
clojure.spec/keys
accepts, namely :req
specifying a vector of
required query parameters, and :opt
doing the same for optional
parameters.
In case of validation failures, clojure.spec/explain-data
is
used to generate error data which is then used to generate useful
error messages for API clients. The data-driven approach also
allows us to deliver these errors as JSON or HTML, for example,
conveniently based on the Accept
header sent by the client.
Additional Validation via :spec
The :query-params
map allows us to validate query parameters
indivudally, but sometimes there are inter-dependencies and you
need to validate the parameters map as a whole. To that end,
:query-params
supports a :spec
key.
The usage is quite simple:
;; `::location` is a map with both latitude and longitude and must fit
;; some predicate
(s/def ::location (s/and (s/keys :req [::latitude
::longitude])
;; any other predicate
valid-location?))
(defresource submit-data
:allowed-media-types ["application/json"]
:allowed-methods #{:post}
:query-params {:spec ::location
:req [::latitude
::longitude]})
This will first validate and extract ::latitude
and
::longitude
and afterwards conform them with the ::location
spec. This will also generate useful error message, again via
clojure.spec/explain-data
.
Handling Dynamic Parameters
A special use case is dynamically specifying the parameter map.
For example, Liberator doesn't have different decisions for
different HTTP methods like GET
and POST
. :query-params
also
supports specifying a function taking a ctx
argument, analogous to
other Liberator decisions/handlers.
An endpoint that only accepts query parameters for GET
can look
like the following:
(defresource paginated-get
:allowed-media-types ["application/json"]
:allowed-methods #{:get :post}
:query-params (fn [ctx]
(when (= :get (get-in ctx [:request :request-method]))
{:req [::limit
::offset]})))
Of course the drawback of this approach is that we can't easily generate API documentation as we're getting into imperative territory again. Thankfully, this is only used in one place in our code base. We're still evaluating how we can make this more declarative.
This also shows a limitation of Liberator: There is no
differentiation between different request-methods. The same
:processable?
function is used for GET
, POST
, etc. This
forces users to write procedural code if they want different
implementations. Another contender for HTTP endpoint handling is
juxt/yada. Yada solves the different request methods issue rather
beautifully by allowing different sets of parameters for different
request-methods. Unfortunately, it's using plumatic/schema
which
we don't want to pull into our project.
Transformation
In addition to the third parameter of defparam
(transform-fn
), it's also possible to use
clojure.spec/conformer
to transform the input before or after
validation.
One could argue that we could remove transform-fn
and replace it
with this approach, but the drawback is that it would couple the
definition of a spec with how it gets read from a request, whereas
the beauty of clojure.spec
lies in its reusability. We don't
want to couple our domain model's specs with how it is represented
externally.
Compound Parameters
The last feature I want to talk about are compound parameters.
Conceptionally, a compound parameter groups a set of parameters
together so that either all or none of them have to be specified.
Additionally, these parameters aren't assoc
'd into ctx
directly but as a separate map under the compound parameter's
name.
Our pagination example above will look like this with compound paramters:
(s/def ::limit integer?)
(s/def ::offset integer?)
(defparam ::limit "limit" string->double)
(defparam ::offset "offset" string->double)
;;; This `s/def` is optional - if our code finds a spec for a compound
;;; param it will use it for validation in addition to validating all
;;; "basic" parameters
(s/def ::pagination (s/keys :req [::limit ::offset]))
(def-compound-param ::pagination #{::limit ::offset})
(defresource submit-data
:allowed-media-types ["application/json"]
:allowed-methods #{:post}
:query-params {:req [::pagination]})
;; `::limit` and `::offset` are available under `(::pagination ctx)`
While this seems very verbose at first, it allows us to reuse much
more parameters much easier. For example, one common parameter
type in our code is ::location
, which consists of three
parameters (lat
, lon
, name
). Without compound parameters,
every endpoint would have to repeat these in its :query-params
map - and we would have to edit every endpoint if we decide to add
or remove something to the compound parameter.
Wrap Up
The combination of clojure.spec
and our own code extending
liberator goes a great length in improving and simplifying our
code. Overall, we're quite happy with how everything turned out.
Our implementation with custom a global database also allows some
very nice things like statically checking if all parameters in an
endpoint are also declared via defparam
, thus not failing at
runtime.
This is even more important when doing it the other way around: When we convert our internal data structures to an external representation that is then sent to clients. This will be described in detail in another blog post.
However, there are some potential pitfalls:
-
Introducing a global database for query parameters comes with all the issues of global state: Defining parameters causes side effects, and the system you're interactively working on might not be the same as the system that gets loaded from the source code.
This is solved by reloading and restarting the entire system, as described by Stuart Sierra in My Clojure Workflow, Reloaded, for example.
-
Evaluation order matters: Being fond of compile-time assertions, we implemented
defparam
so it won't allow us to declare a not-yet-existing spec as a parameter. This contradicts howclojure.spec
handles not existing other specs (they're resolved at runtime), but we want to catch errors as early as possible in development.This means that evaluation order matters: You have to call
clojure.spec/def
before callingdefparam
. This might sound a bit strict, but in the end it wasn't much of an issue.
Future Ideas
Automatic Reflection via clojure.spec
One thing we were missing quite early when moving our codebase to
clojure.spec
and implementing presented features was reflection
on arbitrary specs. It would have helped us a great deal in some
situations (compound params
, :spec
validation) when
clojure.spec
could give us a list of required (:req
) and
optional (:opt
) entries for a clojure.spec/keys
spec. We
wouldn't need def-compound-param
(and possibly :spec
)
anymore.
Now this is a two-edged sword: While reflection would be very
useful in some situations, there is no way to make it completely
generic. What would the hypothetical function reflect-spec
return for a complex composed spec like the following?
(s/or :case1 (s/keys :req [::foo])
:case2 (s/map-of string? (s/or :i integer?
:f float?)))
It is possible to use clojure.spec/form
to get the original
expression used to define a spec, but we decided against using
this approach. While it would work quite fine for simple specs as
(clojure.spec/keys :req [::foo])
, it quickly breaks down for
others:
user> (s/form (s/map-of string? int?))
(clojure.spec/every
(clojure.spec/tuple string? int?)
:into
{}
:clojure.spec/kind-form
clojure.core/map?
:kind
#function[clojure.core/map?--6182]
:clojure.spec/kfn
#function[user/eval69383/fn--69384]
:clojure.spec/conform-all
true)
Attaching Arbitrary Data to a Spec
Another thing that would simplify some parts of our implementation would be the possibility to attach arbitrary data to a spec, like in the following example:
(s/def ::offset pos-int?
{::doc "A non-negative offset"
::query-param/name "offset"})
(s/extra-data ::offset)
;=> {::doc "An Integer", ::query-param/name "offset"}
This would allow us to remove our own databases and instead store
everything via clojure.spec
. Name-clashes could also be avoided
by simply forbidding non-prefixed keys in the extra-data
map.
We think that this would be a very useful feature that could open
new possibilities for clojure.spec
.