Using PostgreSQL Enums in Clojure

17 March 2017

Our biggest Open Source project at bevuta is Pepa, a web-based document management & archival system. Pepa is written in Clojure and ClojureScript. It uses PostgreSQL to store all data, including uploaded ﬁles and rendered images.

PostgreSQL `ENUM`

PostgreSQL was chosen for its Open Source nature and previous experience with it. Support in Clojure (via clojure.java.jdbc) is excellent and lightweight.

One very common type of data to store are enumerated values. Examples are presence information (available, busy, away), weekdays (monday, tuesday, …), or status information (success, failure, pending).

PostgreSQL supports this via the ENUM data type:

CREATE TYPE PROCESSING_STATUS AS ENUM ('pending', 'failed', 'processed');

This piece of DDL creates a new type called PROCESSING_STATUS which can assume the values pending, failed, or processed.

While ENUM is a useful data type, usage from Clojure isn't trivial:

CREATE TABLE files (
  name TEXT,
  status PROCESSING_STATUS
);

This defines a new table called files with two columns: name (a string), and status (our previously defined enumeration). We can insert data via raw SQL easily:

INSERT INTO files VALUES ('my-file.txt', 'pending');
-- INSERT 0 1

Trying to insert invalid values will fail, as expected:

INSERT INTO files VALUES ('my-other-file.txt', 'invalid_status')
-- ERROR:  invalid input value for enum processing_status: "invalid_status"

Unfortunately, inserting rows from Clojure doesn't work like that:

(insert! pg-db :files {:name "my-file.txt", :status "pending"})
;; PSQLException ERROR: column "status" is of type processing_status but expression is of type character varying

This is caused by the rather strict type handling of JDBC: We try to insert a string into a ﬁeld which is an enum. Raw SQL will automatically cast it, JDBC won't.

One common workaround is just dropping the use of ENUM and go with string values but this doesn't offer the same consistency guarantees and is thus a frequent cause of errors.

Another dubiuos workaround is passing stringtype=unspecified in the JDBC connection URL. This causes strings to always bind to to the 'unknown' type, allowing PostgreSQL to cast it into whatever it wants.

Yet another possibility is creating a PGobject with the correct type and pass it to insert!:

(let [status (doto (PGobject.)
               (.setType "processing_status")
               (.setValue "pending"))]
  (insert! pg-db :files {:name "my-file.txt", :status status}))

While this is type-safe and correct, it's pretty verbose. There's a better way.

Extending `clojure.java.jdbc`

clojure.java.jdbc provides a protocol to handle type conversion: ISQLValue. This protocol contains one function sql-value which is used to convert any value to a PGobject.

In Clojure, we usually use keywords or namespace-prefixed keywords to handle enumerations. Using the latter, our processing_status pending could look like :processing-status/pending.

Getting from the keyword to an equivalent PGobject is pretty easy:

(defn kw->pgenum [kw]
  (let [type (-> (namespace kw)
                 (s/replace "-" "_"))
        value (name kw)]
    (doto (PGobject.)
      (.setType type)
      (.setValue value))))

Now we just need to implement ISQLValue on clojure.lang.Keyword:

(extend-type clojure.lang.Keyword
  jdbc/ISQLValue
  (sql-value [kw]
    (kw->pgenum kw)))

This small piece of code allows us write our insert! like:

(insert! pg-db :files {:name "my-file.txt", :status :processing-status/pending})

Now we just have to implement the other direction. Querying a PROCESSING_STATUS still gives us a string and not our namespace-prefixed keyword. That means a queried value will be of a totally different type than the value we inserted. Unthinkable!

Fortunately, clojure.java.jdbc provides a protocol to handle this case too: IResultSetReadColumn.

It too contains a single function, result-set-read-column, which will get called once for every column of every row fetched from the database. Arguments are the value of the column, an object containing information about the current row, and the index of the current column.

With that, and a list of defined ENUM types, we can write the following:

(def +schema-enums+
  "A set of all PostgreSQL enums in schema.sql. Used to convert
  enum-values back into Clojure keywords."
  #{"processing_status"})

(extend-type String
  jdbc/IResultSetReadColumn
  (result-set-read-column [val rsmeta idx]
    (let [type (.getColumnTypeName rsmeta idx)]
      (if (contains? +schema-enums+ type)
        (keyword (s/replace type "_" "-") val)
        val))))

This piece of code will check the name of the type of the column, and if it is in our set of enumerations, convert the value to a namespace prefixed keyword. With this in place, fetching a processing_status from our table will work as expected:

(query (:db user/system) ["SELECT * FROM files"])
;=> ({:status :processing-status/pending, :name "my-file.txt"})

PostgreSQL Enums in Clojure - Closing Thoughts

Once again, Clojure has shown its ﬂexibility when working with different kinds of data. A few lines of Clojure code allowed us to extend the capabilities of clojure.java.jdbc to seamlessly handle a new data type.

Of course, this isn't limited to ENUM: PostgreSQL has many nice data types, including types for spatial data (POINT, LINE, POLYGON, …), monetary units (MONEY), and network addresses (INET, MACADDR). One can easily imagine how the method presented here can be adapted to convert those to native Clojure types, too.

One drawback with this implementation is that we have to manage the set of ENUM types by hand. Ideally it would initially fetch all known enumeration types from the database and store them in an atom.

Using PostgreSQL Enums in Clojure

PostgreSQL ENUM

Extending clojure.java.jdbc

PostgreSQL Enums in Clojure - Closing Thoughts

PostgreSQL `ENUM`

Extending `clojure.java.jdbc`