November 22, 2008


I put up some new code on the aptly titled section of my website:

Included are Jaro-Winkler and Levenshtein string similarity distance algorithms. Levenshtein is a general algorithm based on insertions/deletions/substitutions, while Jaro-Winkler is a more tweaked implementation specifically suited to short strings such as names. One area where the latter comes in handy is denormalizing manually entered records where for example salespersons' names may not be consistently entered. I found that Jaro-Winkler works best if you add the distance of the last name and the first name separately while giving the last name greater weight.

Also included are implementations of sparse vectors, and radix trees (which I blogged about before).

November 19, 2008

Hardware support for garbage collection.

A fascinating email exchange between David Moon and Cliff Click Jr. currently at Azul Systems about the Azul Java servers' architecture. Things that caught my attention:

One of the biggest impact changes we made was a hardware read-barrier for GC - a simple instruction that tests invariants of freshly loaded pointers and takes a fast-trap if the test fails.

GC read-barrier enables a fully parallel & concurrent GC; we can sustain 40G/sec allocation on a 400G heap indefinitely, with max-pause times on the order of 10-20msec. This uber-GC is partially made possible because of the read barrier (and partially possible because we 'own' the OS and can play major page-mapping tricks).

Yes, wide tag per pointer. No problem (yet) with running out of classes. Big Java Apps these days seem to have about 2^15 classes.

The new insight I took from this is that effective hardware support for garbage collection does not have to be complicated. The other two quotes provide further evidence for opinions I espouse: cons all you need (10% of the heap per second!!), and object orientation is an inadequate paradigm for writing software that is now being stretched to absurdity (30,000 classes!!).

November 9, 2008

Compile-time intra-application URI link checking

Here is a neat hack I came up with to do compile-time intra-application link checking for a web application that I wrote. As you might expect the mechanism is based on eval-when facility of CL, but also uses the *compile-file-pathname* and *load-pathname* variables to provide the names of the files where the offending links reside.

The ASDF definition of the application looks like:

(asdf:defsystem :cct
:serial t
:components ((:file "resource-definition")

;; other files

;; link checker (goes last)
(:file "uri-reference-checker"))

Where resource-definition.lisp defines the page-definition and link-reference macros:

(in-package :cct)

(eval-when (:compile-toplevel :load-toplevel)
(defparameter *defined-uri-list* ())
(defparameter *referenced-uri-list* ()))

(defmacro/ps resolve-resource (resource-identifier)
(pushnew (cons resource-identifier (or *compile-file-pathname* *load-pathname*)) *referenced-uri-list*)
(symbol-to-uri resource-identifier))

(set-dispatch-macro-character #\# #\/
(lambda (stream subchar arg)
(declare (ignore subchar arg))
(let* ((base-uri
(with-output-to-string (collector)
(loop until (member (peek-char nil stream nil #\Space t) '(#\Space #\Newline #\Tab #\? #\) #\{)) do
(princ (read-char stream) collector))))
(page (read-from-string base-uri)))
`(concat-url (resolve-resource ,page) ,@(uri-template:read-uri-template stream)))))

(defmacro concat-url (&rest fragments)
`(format nil "~@{~A~}" ,@fragments))

(defpsmacro concat-url (&rest fragments)
`(+ ,@fragments))

(defmacro define-page (page-name (&key parameters (default-request-type :both)) &body body)
(flet ((process-parameter (p) (if (atom p) p (list (first p) :parameter-type (list 'quote (second p))))))
(eval-when (:compile-toplevel :load-toplevel :execute)
(pushnew '(,page-name) *defined-uri-list*))
(define-easy-handler (,page-name :uri ,(symbol-to-uri page-name) :default-request-type ,default-request-type)
,(mapcar #'process-parameter parameters)
(if (and ,@(mapcar (lambda (x) (if (atom x) x (car x))) parameters))
(progn ,@body)
(redirect "/cct"))))))

The link-reference mechanism is implemented as a macro character which builds on the uri-template facility. You certainly don't need to do it this way, but I find URI templates to be quite convenient. The only thing I don't like is the special-casing of the termination symbols (whitespace, closing paren). If you know of a better way, please let me know.

Also note the defpsmacro - this is a Parenscript macro definition, which lets the same link-checking mechanism (and uri-template, which is Parenscript-compatible) work with Parenscript code, which is transformed into JavaScript and can then construct (compile-time checked) URIs dynamically in the browser.

uri-reference-checker.lisp runs after all the page definitions have been made and consists of:

(in-package :cct)

(eval-when (:compile-toplevel :load-toplevel)
(dolist (unreferenced-uri (set-difference *referenced-uri-list* *defined-uri-list* :key #'car))
(warn "Reference warning: referencing unknown URI resource ~a in file ~a" (car unreferenced-uri) (cdr unreferenced-uri))))

You could also provide warnings for defined URIs that have no references.

The actual application code then looks something like:

(loop for date in dates do
(htm (:li (:a :href #/bank-rec-report?date={date} (str date)))))

This is a pattern that I think can be applied to most web applications.