December 3, 2009

Common Lisp bindings now part of ZeroMQ 2.0

Today Vitaly Mayatskikh announced the inclusion of his Common Lisp ZeroMQ bindings into the ZeroMQ 2.0 source tree. ZeroMQ 2.0 is a high-performance messaging system, which is great news if you are trying to build distributed systems in Common Lisp.

Previously, available Free Software alternatives for Common Lisp messaging included CL-XMPP (XMPP client only) and CL-RABBIT (Lispworks only).

Note that ZeroMQ 2.0, unlike version 1, no longer implements AMQP, due to performance reasons.

November 19, 2009

Как программировать скоростные сервера в Лиспе

17 Ноября я презентовал мои идеи о конструкции скоростных серверов в Лиспе Монреальской группе программистов Scheme и Лисп.

Идеи презентации исходят из моей экзаменации веб-серверов Antiweb и TPD2 (смотрим презентацию Джона Фремлина о TPD2 Шибуйским Лисперам здесь). Один недостаток обоих серверов - довольно наивный подход к конкуренции и синхронизации. Я решил конструировать новый веб-сервер на базе этих примеров и размышлений, HTTP DOHC (DOHC - Dual Overhead Camshaft, когда в двигателе автомобиля двое распределительных валов в головке цилиндров - имя выбрано из за связанной двойной системы демультиплексирования в сервере).

Планирую дальнейшее развитие HTTP DOHC в полноценный веб-сервер, с интерфейсом схожим с Hunchentoot 1.0.

Ещё хочу обратить внимание - освободилось личное время для коммерческих проектов. Пишите на vsedach@gmail.com если есть что интересное.

November Montreal Scheme/Lisp User Group presentation

On November 17, I gave a presentation to the Montreal Scheme/Lisp User Group about developing high-performance network servers in Lisp. I looked at the lessons to be learned from existing Common Lisp web servers, notably Antiweb and TPD2, tried out some of the ideas in my own new under-development web server HTTP DOHC along with some new ideas about threading and concurrency, and put the results of my findings into the presentation.

If you're interested in the topic, you should also watch John Fremlin's TPD2 presentation to the Shibuya Lisp user's group.

I plan on making HTTP DOHC a full-featured Common Lisp web server with an interface that's somewhat compatible with Hunchentoot 1.0.

In other news, I now have time available for contract work. Get in touch at vsedach@gmail.com.

Шаблоны и генерация кода

Читатели Russian Lisp Planet уже знакомы с системой archimagа cl-closure-template. Повторять все подробности не буду, но сделаю несколько комментарий о этом подходе к шаблонам.

Главное что надо заметить - cl-closure-template является не системой шаблонов, а системой генерации систем шаблонов. Это примерно та же разница между любым парсером, и парсер-генератором yacc. Если посмотреть на диаграмму систем шаблонов на Википедии, то по сравнению cl-closure-template берет шаблон как параметр и выдает программу, которая берет дату и производит документ.

Такой подход дата генерации истинно Лисповский, и имеет множество превосходств над обычными системами шаблонов. Например можно совместить его с Common Lispовской системой reader macros и получить полноправные новые правила синтаксиса (так сделано в CL-INTERPOL и uri-template). Или пропустить генерированный код через систему трансляции типа Parenscript и получить возможность использовать те же самые шаблоны в разных языках программирования, как и делается в cl-closure-template и uri-template. Так же открывается возможность использования методов частичной эвалюации (partial evaluation) для генерации оптимизированного кода.

November 18, 2009

Templates and code generation

Earlier this month Google released Closure Templates, a new templating library intended for generating HTML. Who cares? I personally dislike HTML templating, and avoid it whenever possible in lieu of s-expression based generation tools like CL-WHO. At first look, Closure Templates didn't seem to be anything new or useful.

The one thing that makes Closure Templates somewhat interesting is that the library works in both Java and JavaScript, so you get client and server-side templates in one place. I did a similar thing with uri-template by having the URI template expand into code that executed in both Common Lisp and JavaScript via Parenscript (here I have to state that despite being an author of a templating library, I still dislike templating libraries and even find my own creation annoying at times).

About a week after Closure Templates was released, Andrey Moskvitin (archimag) wrote an implementation in Common Lisp (interesting note: it took him only five days and 1/15 the number of lines of code as the original). The resulting system, cl-closure-template, similarly generates JavaScript via Parenscript.

I still didn't understand the motivation. Yesterday, Andrey was kind enough to explain it. Those pampered by web development in Common Lisp using s-expression HTML generation tools simply don't encounter the problem of trying to fit HTML templating onto the paradigm of incremental page updating via AJAX. So if you want to build an AJAX web application and use HTML templating, give cl-closure-template a try.

[Blog metanote: if you're reading this blog through its feed, you will shortly see this post show up in Russian. a CONS is an object which cares is now syndicated on Russian Lisp Planet, and I'm going to be writing Lisp-related posts in Russian as well as English. All such posts will be tagged 'lisp-ru'. Filter out that tag if you don't want the Russian version of the posts to show up in your feed reader.]

November 5, 2009

Recruiting Puzzles

One of the most interesting parts of Peter Seibel's Coders at Work (see my review on Slashdot) was Peter Norvig's discussion of the Google programmer hiring process.

The impact of a bad hire on a programming team can be extremely negative - vastly increased code maintenance costs and reduced morale and productivity. "Hire slowly" is now becoming an ingrained mantra in software companies. Despite this, effective processes for evaluating candidate programmers still seem to be little known.

Interview puzzle questions (made (in)famous by Microsoft) as a tool in the hiring process came up in Norvig's interview. Not surprisingly he is against the idea. For a while puzzle questions were a fad in programmer interviews, based on the (unfounded and unverified) assumption that they were a predictor of coding prowess.

To examine why puzzle questions are a dumb idea, you need to look at the objectives you are trying to fulfill when hiring programmers:

  1. Hire someone who is good at programming.

  2. Hire someone who is good at programming as part of your team.


There is no need to resort to unrelated tasks with a low degree of correlation to see whether someone will help fulfill these objectives. A candidate can be assessed directly and efficiently by inviting them for three hours of pair programming.

Why are you still wasting time asking how many golf balls can fit in an airplane?

October 20, 2009

Web shortcomings and opportunities

There is a very interesting post itemizing some of the perceived shortcomings of the web over at the SpaceCollective.

Some of those problems have been addressed by researchers in the past - notable examples are Freeman and Gelernter's Lifestreams, and Nelson's transclusion (I assume this is what the author of the SpaceCollective article means by the term collage; in any case you can start using a limited form of transclusion today).

Some of those problems are being addressed on an ad-hoc basis by commercial companies today. If you take some time to think about the problems, there is still a wide open field of opportunity to provide commercial offerings around the ideas there.

October 18, 2009

The history of programming language syntax

http://news.bbc.co.uk/2/hi/technology/8306631.stm

Only if all the other "programming language designers" had the conscience to apologize for the pain their thoughtlessness has caused.

October 15, 2009

Computers are not getting faster

When we discuss the inclusion of a feature in Lua, we ask ourselves, "OK, but will it run in a microwave oven?"
--Luiz Henrique de Figueiredo

I recently started working with John Fremlin's TPD2 HTTP server, one of the fastest web servers currently available. TPD2 uses a lot of interesting (at least from a Lisp point of view) techniques to achieve high performance. This reminded me of something I noticed in Masterminds of Programming: both Chuck Moore and the creators of Lua emphasized the tendency of computers to scale down.

When laptops and servers are getting faster processors with larger caches and more cores (which doesn't really help because we don't know even know how to think about writing software for them), it's easy to overlook the fact that we're trying to imbue ever smaller objects with computational intelligence. Moore provided the extreme example of a chip having cores with 64 words of RAM and 64 words of ROM memory each. A less extreme example comes in the form of smartphones.

At the same time use of existing systems is increasing to the point where some people are measuring how much servers they buy by the number of tons of waste their packaging generates. If each of your servers could handle three times as many requests per second, you could fit it on a computer three times as small, or have a datacenter a third of the size. The latter is a pragmatic consideration applicable at any time, but the former enables entirely new patterns of use and interaction.

October 13, 2009

The impossibility of global state

(This post is a write-up of some ideas about concurrency that I've briefly discussed before).

By far the bulk of concurrency practice is aimed at dealing with systems based around shared, mutable state. Participants are asked to imagine a world with a global state whose changes are instantaneously observable by everyone. This world is a physical impossibility.

The global state illusion is based on elaborate consensus mechanisms. The concept only works because our perception of the flow of time is fixed vis-a-vis the rest of the world, which happens to be just quick enough to execute consensus algorithms over large networks that we don't get annoyed.

The common refrain to criticisms of the global state illusion is "but think of the bankers!" The consistent bank account with atomic deposits and withdrawals is an example found in almost all textbooks on concurrency. It is a neat and self-contained pedagogical tool for discussing the mechanisms needed to construct a global state illusion, but unfortunately its pervasive use seems to have done immense damage to any attempts to actually reason about concurrent systems.

Networked credit banking existed in 4th century BC Egypt. Banks still accept checks. Your bank account still has overdraft fees. Bank accounts work by reifying transactions as first-class objects. That this idea is considered novel is a testament to the pervasive confusion surrounding concurrency today.

What is the global state illusion really needed for? Distributed shared simulations. Most commonly seen in the guise of videogames. There, all the participants really do need to see a consistent view of the world with changes occurring in quantized time steps.

The key insight behind the global state illusion is that it is not time-scale independent. Whether the illusion can be maintained depends on how the participants perceive the flow of time relative to how quickly the mechanisms behind the illusion work. Banks could have implemented a pen-and-paper two-phase commit protocol for deposits and withdrawals before the advent of telecommunications. It would not have been very useful. This is the same exact issue affecting cache-coherency protocols in multi-core processors, at the micrometer scale.

The way forward in concurrency practice is the abandonment of the global state illusion in favor of constructing systems based on how concurrency works in the real world, taking cues and patterns from organizational and biological systems. This will entail the use of techniques from functional programming and linear logic.

October 10, 2009

Append-only databases

Two years ago, Pat Helland wrote Accountants Don't Use Erasers, which resonated widely in the blogosphere. The central premise, append-only databases, offered a glimpse of the promises of purely functional programming. The idea itself is probably as old as transactional databases, however it has been significantly delayed in practical realization due to the nature of storage hardware.

Append-only filesystems have been around for some time for hard-drives, used mainly for archival storage (the Plan9 Venti filesystem being the most widely known example). Append-only database implementations have been held back by the high seek latency of mechanical drives. This is changing as high-capacity flash drives are becoming available.

Over the summer, Slava Akhmechet (of Weblocks fame), along with Michael Glukhovsky and Leif Walsh, started RethinkDB, a project to build a new append-only MySQL backend. Aside from the performance improvement, RethinkDB manages to solve some glaring MySQL DB management problems: hot backups (without special tools) and fast failure recovery.

The possibilities for flash storage to enable significantly greater database performance were previously noted in several papers (see Lee et alia and Graefe; even the late Jim Gray weighed in on flash drives).

September 30, 2009

A better way to do screencasts

My friend Paddy Mullen recently released TerminalCast, a sort of online ttyrec player with synced audio (it is in fact based on a full in-browser JavaScript reimplementation of rxvt). This lets you do non-blurry screencast tutorials with full copy-and-paste support. I'm planning to make a Lisp web tutorial sometime in the near future.

Paddy will be giving a talk about TerminalCast at the next LispNYC meeting on October 13.

September 21, 2009

New Parenscript release.

A new version of Parenscript has just been released (see the official release announcement for some more details). This release fixes a number of issues (among them poor compilation speed due to terrible choice of implementation on the part of yours truly, for which I have recently been called a lot of bad names in the Russian Lisp blog world).

September 2, 2009

Eager Future

I just put up the common-lisp.net project page for Eager Future, a library for concurrent programming with composable, eager futures (see this LtU discussion of what that means).

Eager Future is a rewrite of Marijn Haverbeke's PCall library (I've blogged about PCall and extending it with support for composition before). Unlike PCall, Eager Future executes futures eagerly, which lets you use the futures composition mechanism for soft real-time programming.

Coders at Work

Slashdot just ran my review of Peter Seibel's Coders at Work. The book comes out next week. Bottom line: pre-order you copy now!

August 24, 2009

The right way to use ORM: don't

A couple of days ago I came across Ted Neward's essay on the problems with object-relational mapping schemes, The Vietnam of Computer Science. A really good examination of the problems with ORM, the essay wraps up in a list of six possible approaches to the problem of working with relational databases.

The database and the object-oriented application querying the database are clearly two different systems with two different domain models (the model of the programming language the application is written in, and SQL, respectively). From the perspective of domain-driven design, they represent two bounded contexts that are typically bridged using the repository pattern.

Barring the trivial ways out of the problem - using an OO database, hacking around the ORM, or "abandonment" (which seems to be a nonsensical proposition, given that the application language will need to have some kind of model different from SQL), there are three interesting alternatives left from the list: manual ORM, embedded relational DSLs, or building on your language's object model with relational-friendly classes.

I don't think embedded relational DSLs such as LINQ help solve the object-relational model mismatch in any but the most trivial ways. All they really are are macros providing syntactic sugar for database access; all the domain mismatch problems are still left unsolved.

The manual ORM and object model extension methods have proven fruitful in my experience.

Matchcraft's web directory product was already using sql2java to generate abstract Java classes (which were then subclassed and given domain logic) from a SQL schema hand-constructed from the domain model when I came on board the project. This approach worked well - the database schema and the object model were clean and sql2java's code generation provided efficient cruft-free boilerplate. There was no ORM layer to drain attention and fatten the software stack.

The only problem was that some behaviors were implemented with hand-written SQL queries right inside the derived class. This mix of domains was reflected in the culprit behaviors being problematic to modify and unit-test. By refactoring the system to use the repository pattern, I was able to make those problems go away, while still preserving all the benefits of the "sql2java and some handwritten queries" approach that yielded both a clean database and a clean object model.

On another project (this one to build a flight reservation system for small tour and air charter operators, written in Common Lisp), I went the other way with code generation. I wrote a set of macros that took domain object definitions and generated a SQL schema and CLOS objects describing those domain objects. The object instances were just lists of fields as returned by CLSQL (with type conversions done automatically based on information in the "meta objects"). The behaviors were based on a mix of multimethods specialized on the "meta objects" as well as ones executing hand-written SQL queries directly.

The above approach combined both aspects of the manual ORM and object model extension methods. While CLOS does come with an extensible meta-object protocol (and CLSQL already comes with a CLOS ORM based on the MOP), I wanted to prioritize a clean database schema above ease of integration with Lisp (as should any project where a SQL database is planned to be queried by more than one system). While Common Lisp's macros and multimethods made this approach extremely fast and easy, there is no reason it can't be used with other languages.

August 17, 2009

New homepage for uri-template

uri-template now has its own common-lisp.net project page and mailing list:

http://common-lisp.net/project/uri-template/

The current version of the library is 0.6, released on 2009-08-17.

August 12, 2009

History of the American computer revolution

I recently came across Steve Bank's Secret History of Silicon Valley series of blog posts. The essays do a wonderful job of dispelling the myth of Silicon Valley as "a bunch of orchards and then Hewlett & Packard came along." As with the major east-coast universities and research institutes that were instrumental in creating the computer revolution, it was the US military that turned out to have provided the resources for the start of Silicon Valley as we know it today (although in the form of the Air Force and not DARPA).

The east-coast/DARPA contribution to the computing revolution is described in similar investigative detail in M. Mitchell Waldrop's The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal. Blank's essays and The Dream Machine do a very thorough job of detailing the rise of today's computing industry; I consider the latter essential reading and am now happy to add the former to the same category.

Howard Rheingold's Tools for Thought is an interesting, light overview of the broader history of modern computing machines from Charles Babbage to Ted Nelson (the first edition was published in 1985, back when hypertext was still hypertext and there was no web).

July 29, 2009

The reusability fallacy and domain-driven design

How can you make software reusable? The prevailing design fashion is to anticipate changes in the way that software will be used, then try to provide mechanisms for accommodating these imaginary uses. This approach would work, if you could predict the future.

The root cause of the drive to build reusable software is that the circumstances under which software is used change, ergo software needs to be changed to accommodate those circumstances. Seen this way, the concept of "building for reusability" is nonsensical. The solution is to go to the source of the problem and make software easier to change. Stop wasting time trying to design for reusability, start investing time in continuous refactoring.

One way reusability is achieved is through abstraction. All successful software libraries (graphics, algorithms, etc.) set out to solve problems in a particular domain. This is not "designing for reusability," this is designing software to address a particular domain broad enough to be useful for a wide variety of applications.

Platonic software design would consist of composing these domain libraries together using engineering techniques of abstraction, the outside ones building on the inside ones, like the layers of an onion. Two problems preventing this from being reality are leaky abstractions, and domain mismatch, making techniques such as Open Implementation necessary.

The only code reusability technique that is somewhat successful is the strategy pattern, but that is because the strategy pattern is just first-class functions and polymorphic types, which have a simple formal model behind them.

June 28, 2009

Live SQL database schema updates

I previously wrote about managing live DB schema changes using protocols and context-oriented programming. About a month ago, I came across Jonathan Oxer's presentation about dealing with live schema changes to MySQL databases. The technique Oxer presents is a kind of obvious-in-hindsight thing that is beautiful in its simplicity: run update scripts and retry if an error (unknown table/column) in the query occurs.

Readers familiar with Geoff Wozniak's work will notice the similarity to his ideas about working with undefined functions (which, incidentally, was previously discussed on this blog in the context of using metaprogramming techniques to implement DB abstraction layers).

This is something I'm definitely going to try in my next SQL database based project.

June 8, 2009

Why your language needs macros

A couple of days ago someone posted yet another "why do I need macros?" thread on Hacker News. The usual arguments and unconvincing examples arguing for macros were posted. Noted absence were the arguments against macros (all of which seem to be of the variety "programmers are too dumb to understand someone else's macros" - if you were at the Great Macro Debate at ILC 09 you would have heard it stated more eloquently).

I've been thinking about macros in terms of domain-driven design lately, kicking around my idea of "domain onions" (I'll write more about this later), so I decided to post my current thoughts about why every programming language that aspires to be general-purpose needs macros.

Gelernter

Two years ago I posted a review of Nicholas Carriero and David Gelernter's How to write parallel programs. Today I came across an Edge roundtable discussion with David Gelernter (via Patrick Logan).

The entire discussion is fascinating, and I recommend reading up more on Gelernter's work if you are not familiar with it before watching the interview.

For me, there were three new insights from the roundtable:

  • Message passing is the assembly language of distributed systems.

  • Databases are a way to communicate through time. Networks are a way to communicate through space. Both are symmetrical. The point of Linda was to unify them.

  • The most powerful computers today are botnets. Botnets are also on the leading edge of distributed systems techniques.

In the second video, Gelernter discusses Lifestreams. It's amazing how closely FriendFeed, Twitter, and the new Facebook are all following the roadmap established by Lifestreams over a decade ago.

June 4, 2009

Debugging with H-toot

Andreas Fuchs shows how to get Hunchentoot 1.0 to go into the debugger on handler errors, like previous versions of Hunchentoot used to do when *catch-errors-p* was nil.

This should have come as part of the 1.0 release IMO.

May 21, 2009

May 4, 2009

On the value of metaprogramming

Several weeks ago I wrote about a way to implement Smalltalk-style predicate function-to-SQL translation in Common Lisp, so I was amused to come across Dejavu for Python (via Jonathan Ellis), which implements the same technique by inspecting CPython bytecode.

It's amazing what kind of dumb hoops people will jump through when they don't have a system that permits real metaprogramming.

April 22, 2009

Review: Let Over Lambda

So after a lot of not reading Doug Hoyte's Let Over Lambda, I finally did manage to read it all the way through.

My overall impression first: Hoyte styles the book as a successor to PG's On Lisp; I think Let Over Lambda falls short of that goal, although it does contain enough interesting material to make the book worthwhile.

The best parts of the book are the chapter on read macros and the subsection on sorting networks. Great, practical examples and illustrations of good programming style.

The worst parts of the book are the chapter on implementing a Forth interpreter in Lisp and the "caddar accessor generator" (it's ok for me to say this because the name of this blog is ironic).

The chapter on anaphoric macros has finally made me change my mind about those things. It's ok, use them when you need them. All the stuff about "sub-lexical scoping" (ie - various interesting ways of using macros to capture free variables) didn't really make a deep impression on me - maybe I'm just too dull to see any good uses for that.

As pointed out in other reviews, the book could have used a lot more proofreading (esp of the code) and editing. Hoyte chose to self-publish the book, which I think was a mistake (and just today Tim Ferriss blogged about some other reasons why it's not a good idea).

To cap the review, don't read Let Over Lambda until you've read On Lisp. It's a fun book about some interesting Common Lisp programming techniques, but it could have been shorter.

April 15, 2009

Problem-solving is hard, let's go write XML configuration files

Previously I blogged about why the premise behind software frameworks turns out to be a logical fallacy. This blog post will continue my attack on the framework cult by examining the reasons why people continue to believe in the myth of increased productivity through frameworks.

There is one word that can summarize my argument: comfort.

Consider carpentry tools. How do you use them? What do you use them to accomplish? Now consider a toy construction set such as the Erector. The toy construction set offers you a path you can follow to arrive at some cool artifact, even if you don't know exactly what you want to do. The construction set can offer this security because it limits what you can accomplish, and how you can accomplish it.

According to Buddha, ignorance is the root of all evil. According to Christian tradition, people are inherently evil. It is then no surprise that most of the time most people do not know what they want to do. Finding out "what" is hard, requiring a lot of time and learning. In the realm of software development, this is practiced by methodologies such as domain-driven design. It is a lot easier to write XML configuration files instead.

There is a misguided comfort that comes from knowing that you did a day's worth of honest work. It doesn't matter if what you are doing is leading you down the wrong path, because with a framework you are at least accomplishing something, even if that something will not bring business value. As a project manager who decides to use a particular framework, you are in effect acting as a proxy sales agent for the party promoting that framework - selling your customer a solution that may not be in line with your customer's goals.

Many logical fallacies go into the typical software development decision-making process. Frameworks in particular are notoriously prone to the bandwagon effect (how many "enterprise" web-development frameworks for Java have come and gone and with what ridiculous frequency?). Any person held responsible for the consequences of a decision will be prone to post-purchase rationalization (remember that before a person tries to sell a methodology to his organization, he has to have been sold on it her/himself), so the bandwagon effect is frequently used as an argument for adopting a particular framework. In addition, nebulous terms such as "enterprise" (which, because of its strong association with frameworks, has almost universally acquired the definition as "verbose junk" in the software development world) somehow end up in place of well thought-out arguments and empirical evidence.

Obviously, there is some circumstantial evidence that frameworks do enhance productivity - people become experts at particular frameworks and can efficiently develop applications. This is not because they are using a particular framework, or because they are using a framework at all. This is because they have become experts at it.

April 9, 2009

Closure-oriented metaprogramming via dynamically-scoped functions

Today I came across this post from the ll1 mailing list (almost 7 years old now, via Patrick Collison's blog) from Avi Bryant explaining how Smalltalk's message-based dispatch permits a type of metaprogramming with closures, as an alternative to macros.

Of course if you've read Pascal Costanza's Dynamically Scoped Functions as the Essence of AOP (and if you haven't, click the link and do it now; it's one of my favorite CS papers), you will realize that there is no need for message-based dispatch or any kind of object-oriented programming to do that. All we need are dynamically-scoped functions.

Here is how I approached the problem:

(defpackage "BAR"
(:use "COMMON-LISP")
(:shadow #:=))

(in-package "BAR")

(defmacro dflet1 ((fname &rest def) &body body)
(let ((old-f-def (gensym)))
`(let ((,old-f-def (symbol-function ',fname)))
(unwind-protect (progn (setf (symbol-function ',fname) (lambda ,@def))
,@body)
(setf (symbol-function ',fname) ,old-f-def)))))

(defmacro dflet* ((&rest decls) &body body)
(if decls
`(dflet1 ,(car decls)
(dflet* ,(cdr decls)
,@body))
`(progn ,@body)))

(defun first-name (x) (gnarly-accessor1 x))
(defun address-city (x) (gnarly-accessor2 x))
(defun = (&rest args) (apply 'common-lisp:= args))
(defmacro & (a b) `(block-and (lambda () ,a) (lambda () ,b)))
(defun block-and (a b) (when (funcall a) (funcall b)))

(defun some-predicate (x)
(& (= (first-name x) "John") (= (address-city x) "Austin")))

(defun make-parse-tree-from-predicate (predicate-thunk)
(dflet* ((first-name (x) '#:|firstName|)
(address-city (x) '#:|addressCity|)
(= (a b) `(= ,a ,b))
(block-and (a b) `(& ,(funcall a) ,(funcall b))))
(funcall predicate-thunk nil)))


Then (make-parse-tree-from-predicate #'some-predicate) yields (& (= #:|firstName| "John") (= #:|addressCity| "Austin")), which we can manipulate and then pass to a SQL query printer.

Here I implemented dynamically-scoped functions using unwind-protect, which is not as powerful (or, possibly, efficient) as the implementation presented in Costanza's paper, but is simpler (I also used the same trick to implement dynamically-scoped variables in Parenscript).

The property of the same Lisp code to mean different things in different contexts is called duality of syntax by Doug Hoyte in his excellent book Let Over Lambda (almost finished reading, promise to write a review soon). Lisp offers this property both at run-time (via late-binding and closures) and at macro-expansion time (via homoiconicity and the macro-expansion process itself).

Another technique from Let Over Lambda illustrated in the above code is the recursive macro. This one is a personal favorite of mine; I find that the iterative simplification that recursive macros express provides very clean and maintainable code.

This code also provides examples of the two problems that the closure-oriented metaprogramming approach encounters in Common Lisp:

The first is the fact that we had to shadow = in our package. Common Lisp forbids the redefinition of the functions, macros and special forms defined in the standard, so we have to go out of our way if we want to achieve that effect. Barry Margolin provided a rationale for this in comp.lang.lisp post.

The second is the fact that Common Lisp has so many special forms and macros - and just happens to be one of them. Smalltalk avoids this problem by doing virtually everything via message passing and closures. In Common Lisp we don't have this straightjacket, but we also don't have this luxury of assuming that everything is an object or a closure.

Another Common Lisp feature that might break this example is function inlining (then again, I did just write about the benefits of late-binding...).

April 8, 2009

Masterminds of Programming

Today my copy of Masterminds of Programming arrived in the mail from O'Reilly; a small reward for giving a lightning talk at ILC.

The book is a series of interviews with programming language designers. Along with some expected atrocities like Stroustrup on C++ and Gosling on Java, and bizarre ones such as a 50-page interview with Jacobson, Rumbaugh and Booch on UML, there are interviews with Adin Falkoff on APL, Moore on Forth, Wall on Perl, and a few others. The functional camp is well-represented with SPJ, Hudak, Wadler, and Hughes interviewed about Haskell, and Milner giving an interview about ML.

It is telling that right at the preface the book starts off with an urban legend: "children can learn foreign languages much more easily than adults."

Some of the interviews are very revealing. The discussions present an entertaining window on the cavalier attitudes and biases of many programming language designers, which helps explain some of the dysfunction of the software world today. I don't think this is what the editors intended, but it makes for hilarious reading material.

Some of the interviews can be frustrating to read (every third question in Falkoff's APL interview seems to boil down to "lolz funny syntax"); thankfully this is balanced out by absolutely delightful ones such as Moore on Forth (IMO, the highlight of the book), and the Objective-C interview with Brad Cox and Tom Love. Overall the quality of the interviews varies widely, but not surprisingly mostly seems to correspond to the quality of the language being discussed.

Ultimately Masterminds of Programming is worthwhile not for its insights into programming language design (most of which unsurprisingly boil down to "oops I made a bunch of mistakes because I didn't start with a good model/think/know any better/know enough math"), but into programming and computing history in general.

To finish this review where it started off, here is another unintentionally amusing bit of insight from the preface:

Imagine that you are studying a foreign language and you don't know the name of an object. You can describe it with the words that you know, hoping someone will understand what you mean. Isn't this what we do every day with software?

It comes as no surprise that there is not a single entry for either "macro" or "metaprogramming" in the book's index (although Wadler does make a passing mention of Lisp macros in the Haskell interview).

March 26, 2009

P-Cos Blew My Mind

This post contains my impressions about the rest of ILC 2009. I'll start off with something that really impressed me.

One of the things I have been thinking about lately is database schema evolution. The flip side of database schema changes are the corresponding changes in the client code they entail. Combining the two with live system upgrades can potentially give rise to inconsistencies. Even ignoring databases, loading code into a multithreaded Lisp image can result in undesired behaviour since the loading is non-atomic.

This problem was discussed on comp.lang.lisp several years ago, with the general consensus being that some sort of atomic loading mechanism was needed to address the problem.

However atomic loading alone does not eliminate inconsistencies for systems providing services via asynchronous protocols, such as web applications. Any users in the middle of a workflow consisting of a chain of requests to HTTP resources will be affected if any of those HTTP resources are changed.

Pascal Costanza came to the rescue by presenting (during a lightning talk) a solution utilizing context-oriented programming. Each set of updates was integrated into the system as a new layer. Atomicity is provided by the fact that layers can be (are?) activated atomically, and can be made completely independent of each other. By associating layers with web application sessions, Pascal was able to preserve consistency for those users in the middle of a session during the time of update.

In a fascinating discussion on protocol versioning on Jane Street's OCaml blog a while back someone advocated essentially the same approach to the problem by utilizing two servers running different versions of the application. Pascal's context-oriented approach not only eliminates the need for the extra hardware (one server for each version of the protocol), but also provides an elegant way to structure the code for multi-version protocols, and to apply other, orthogonal updates that would affect users of all protocols provided by the system, which would otherwise require backporting patches.

If database access is implemented as an abstraction layer that acts as a protocol, the same technique can be applied to schema changes as well.

From the above discussion it is easy to overlook a really neat fact, but Nick Levine's lightning talk came to the rescue: load and fasls make a really great patch distribution mechanism. You can view the code from his talk here.

Shriram Krishnamurthi gave a very nice talk about hooking kids on Scheme by letting disadvantaged inner-city middle-schoolers put lambdas in their lambdas so they can make videogames in their cellphones. Purely functional programming a-la How to Design Programs turns out to be a really great way to teach algebra. Shriram also dislikes the OLPC project and thinks Etoys is weak.

Gerald Sussman gave a talk that essentially argued against formal methods for systems design, and for extensible systems with flexible failure modes.

One of the key reasons why MIT shifted its introductory CS curriculum away from the SICP-style bottom-up abstractions building approach is that the behaviour of modern systems components is too complicated for it to be practical for them to either be well-specified, or modelled in a reductionist way. A lot of the time engineers have to resort to an empirical approach to discovering the actual semantics of a sub-component.

To manage this complexity, Sussman argued for open systems that permit a great degree of extensibility and continue to function under various failure modes. An example he used was an autopilot system: it should continue to work under unexpected conditions such as control surface failures. This echoes some of the ideas of Richard Gabriel about mob software. Although Gabriel was at the conference, the term itself never came up, but Richard did object to macros at the macros debate by mentioning that they would be a hindrance to constructing systems worked on by thousands of programmers.

Fail-fast behaviour of a whole system under unexpected circumstances is something that formal specifications cannot avoid. A great quote from Sussman was: "proofs considered harmful." What Sussman was arguing for, in his words, was high-quality abstract specifications, not low-quality well-specified ones.

Sussman presented several ideas about taking inspiration from biological systems, among them biological degeneracy, which can best be summed up as "doing different things to get the same result," and the fact that a relatively small number of genes are responsible for common physiological structures in organisms.

The bio-mimetic argument was echoed again in a lightning talk by someone whose name I unfortunately cannot recall or look up (as the full list of lightning talks has not yet been posted anywhere). One of the things mentioned was parallelism in multi-cellular systems - of course I had to jump in and point out that the best lesson we can take away from parallelism in the real world is a rejection of the concept of global state (more on that in an upcoming post).

Olin Shivers gave essentially the same talk about his system for expressing scoping and iteration in terms of graphs as he did at Danfest, and this time wasn't any more exciting. Someone actually fell asleep and started snoring loudly a few rows down from me. The discussion at the end was quite good though, one of the things Shivers emphasized that echoed Sussman's keynote was the fact that unspecified behaviour actually gave you more opportunity for expressive power.

I gave a lightning talk based on a blog post I made previously, arguing that the criteria for whether language X is a Lisp should be how well you can write a metacircular evaluator for that language. No one booed or threw things at me, and some very smart people made insightful comments, so I think it went ok. The question of static typing came up, and after the talk I had a big argument with a couple of people trying to explain what type systems are and what they do. Typed calculi need to be taught as part of an undergrad CS cirriculum - most people have a very skewed understanding of what type systems are and this lets language "designers" get away with doing a lot of stupid shit.

March 24, 2009

ILC 09 Pt. 0

Flew into Boston on Monday, arrived at the Stata Center just after the conference lunch. Highlights so far:

Michael Greenberg's lightning talk about TRAC, both for being probably the most well-prepared 5 minute talk ever, and for explaining the key essence (self-modifying code) of pure macro languages. Later during the macros debate Greenberg made the very insightful point that conceptually the Lisp language and Lisp macros are two distinct systems and need to be reasoned about as such.

Joe Marshall's daughter's insight about the subject of his talk: "[continuations are] just autosave for your program!"

Pascal Costanza's superhuman knowledge of Lisp history, wit, and strong opinions.

Some things that have been done to death and should be irrelevant but still managed to be somewhat entertaining: the "future of Lisp" panel and the "are macros bad" debate.

March 15, 2009

Parenscript birthday present

A while ago I mentioned I was going to announce some exciting news about web development tools. Shortly thereafter my priorities changed, and I shelved the project with the intention of picking it up again in the future. Then I remembered about LispNYC's participation in the Google Summer of Code program - if I can't do the project now, why not try to get funding from Google to have a student work on it?

Here is what all this is about:

The PSDE project aims to develop an Emacs-based JavaScript programming and debugging tool for AJAX-based RIAs. The implementation will provide an unobtrusive client-side script that can be loaded alongside any JavaScript web application, and an Emacs interface (REPL) that will enable debugging of remote and mobile clients, including the possibility of no-reload drop-in for live user sessions.

Along with an interactive prompt/REPL, PSDE will provide a profiler, tracing and logging facilities, a DOM inspector, a completion and documentation system for DOM symbols, and other useful web programming tools, both by using the reflective capabilities of JavaScript, and by providing hooks (a la Open Implementation) into the Parenscript compiler to provide metaprogram information that cannot be gleamed using runtime techniques or analysis of JavaScript code.

In addition to providing a valuable tool to the web development community, the OI extensions to the Parenscript compiler included as part of the PSDE project will be useful in their own right to those wishing to create other web development tools based on Parenscript, or web developers using Parenscript and needing advanced JavaScript generation capabilities.


The project has its roots in the "browser Comet REPL" demo I gave during my talk about Parenscript at LispNYC in September 2007, but the idea of making a "browser SLIME" didn't occur to me until this past November, when Daniel Gackle told me: "Why don't you make a browser SLIME?"

If you are a student that might be interested in this project and qualify to participate in Google SoC, or know of such an individual, get in touch: vsedach@gmail.com. I'll be happy to answer any questions and help you with preparing an application. Google will start accepting student applications March 23rd, with the cut-off date being April 3rd.

UPDATE: LispNYC has not been selected as a mentoring organization for SOC 2009. I would like to fund this project myself, but currently cannot afford it.

March 14, 2009

Parenscript's 4th birthday

Today is a day of many celebrations. As most of you are aware March 14 is π Day. It also happens to be Einstein's birthday.

Four years ago, Manuel Odendahl announced the release of Parenscript to the public. Reflecting on the original post in the context of what Parenscript is today, the power and appropriateness of Manuel's design have more than proven themselves, but also I see the amount of exciting work that is still ahead.

March 13, 2009

I think I finally understand what Alan Kay is saying about Lisp

Joe Marshall recently wrote a series of very entertaining blog posts telling the story of his first experiences with Lisp:



I had a similar experience with Lisp - I didn't go to MIT or take 6.001, but reading the first couple of chapters of SICP changed my entire outlook on computer programming. By the time that happened (the summer of 2002, right before I started university), I was already somewhat proficient with C, although I wasn't particularly interested in computer programming as an activity in and of itself.

The funny thing is I was exposed to Scheme my first year of high school through Mike Zamansky's MCS1 at Stuy. That experience wasn't particularly memorable, even less so was one of the other programming languages used in the course (Python, I think?). On the other hand the course also taught a whole bunch of really neat stuff in StarLogo, which as they say in Canada was the cat's ass.

An even more amusing fact is that after going to see the professor before the start of the first term and taking the challenge exam that would give me credit for the first computer science course, I decided that since the university computer science courses didn't look anything like SICP, I would probably be better off investing my time in a math degree. The algebra and calculus examples in SICP were my first exposure to elegant computational mathematics, and a key reason for choosing math and not something else.

What does this have to do with Alan Kay?

Yesterday before falling asleep I was plagued by the question: why don't I enjoy using Haskell as much as Lisp? Haskell is purely functional, it features a really neat and unobtrusive type system, which when taken together with the great pattern-directed invocation programming style provides a better alternative to object-oriented programming. It's a really great programming language.

Could the reason be the ugly syntax? No, that's easy to fix. The type system doesn't bother me. There's a template system for both Haskell and Liskell syntax that provides preprocessor-style macros, but monads actually provide a much more powerful way of building control abstractions than macros do.

The only thing I could think of that Lisp has that Haskell can't provide is a dynamic runtime system. It's as if Lisp comes with a really neat invisible virtual machine.

Among the many nice things Alan Kay says about Lisp, the term late binding comes up often. Practitioners of programming languages that Alan Kay did not have in mind when he invented the term "object-oriented" have taken it upon themselves to equate this to the concept they term dynamic binding, which they define to be type-directed dispatch done at runtime.

This is a textbook example of cargo cult programming. Dynamic types and runtime dispatch are not axioms of Lisp and Smalltalk, they are the emergent properties of self-describing computational systems. The key is that the definition of Lisp is written in Lisp. The "invisible virtual machine" in Lisp is Lisp. That is what Alan Kay means by late binding.

Well, this insight did take almost seven years, but at least now I have a snappy label for the dynamic languages du jour whose existence helps put a lot of computer book publishers' editors' kids through college and kills hundreds of thousands of trees every year: cargo cult programming languages.

The question that remains unanswered is why isn't Haskell self-describing? The reason is that there are actually two computational systems behind Haskell: one for the language itself, and one underlying the type system for the language. If type systems could be expressed in a metacircular way by the systems they were trying to type, they wouldn't be able to express anything about types (restrictions on computational power) of that system.

You can read more about the metacircular definition of Lisp in several books. The first of these is the original Lisp 1.5 Programmer's Manual; Alan Kay has called the metacircular definition of Lisp provided there "Maxwell's Equations of Software." SICP includes a section on constructing a metacircular evaluator, and a whole chapter based around extending that evaluator with non-determinism and logic programming. The most thorough examination of the metacircular evaluation formulation of Lisp is present in John Allen's highly recommended 1978 book, Anatomy of Lisp.

March 12, 2009

Frameworks and the conjunction fallacy

Frameworks are great. After all, some helpful way to do X combined in some helpful way with some helpful way to do Y must be a whole lot more helpful than just a way to do X and just a way to do Y, right?

Wait a minute, anyone who has used a framework might say, what if someone wants to do Z instead of Y? Why, of course, say the framework "architects," we'll just provide a helpful way to help you choose helpful ways.

How helpful! Now not only do you have to know how to do X and how to do Y and how the framework helps you do X in a helpful way with doing Y, but you also need to know the helpful mechanisms behind letting you choose the helpful ways.

Clearly the original argument for helpfulness broke down somewhere. But where, and why?

The underlying fallacy behind the majority of software design today is the belief that computer systems can be "useful" and "helpful." Useful and helpful for doing what? For doing what they were designed to do. Stated in these terms it is immediately apparent that this mindset is a tautology. Completely useless as a basis for reasoning.

Is there a better way of thinking about software? Let's start by asking why. The entire motivation underlying our enterprise is the desire to accomplish X and Y. Each step we take can either bring us closer to that goal, or not. Given that time and other resources are finite, each step that does not bring us closer to the goal is "unhelpful" - it gets in the way of what we want to do.

I believe the right way to think about computer systems is as things that get in the way of what you want to accomplish. This way the goal of software design becomes not getting in the way of what you want to do.

Given this approach it immediately becomes obvious that the argument presented at the start of this post is simply a conjunction fallacy. Likewise many other "reasonable assumptions" about software that are not borne out by experience can be shown to be flawed a priori. Most importantly, this mindset helps prevent such "reasonable assumptions" from being taken up in the first place.

While this post debunked one of the arguments used to both justify the development of frameworks and their use, the decisions surrounding the latter are also driven by other logical fallacies, and, like the vast majority of human decision-making, by emotions. In a forthcoming post I will examine some of these factors.

[Major thanks to Ryan Holiday for providing the inspiration for this blog post.]

Assorted Lisp news

Last week I came across Bartosz Milewski's blog post criticizing the proposed implementation of futures in the forthcoming C++ standard. Milewski's chief dissatisfaction was the absence of a mechanism for composition. PCall (which I've blogged about before) was in a similar position, so I decided to implement a mechanism for non-deterministic composition and send in a patch. Milewski's blog post also spawned an excellent discussion about futures on Lambda the Ultimate.

Today I rewrote the Parenscript tutorial to use the recently released 1.0 version of Hunchentoot instead of AllegroServe. Edi Weitz and Hans Hübner did a major redesign for the 1.0 release, breaking interface compatibility with previous versions of Hunchentoot. The new interface provides greater Open Implementation capabilities, and makes a sharp demarcation between methods intended for OI and those for regular server programming. I like it.

Beware that if you're not using LispWorks, the 1.0 release of Hunchentoot depends on certain library capabilities (I suspect it's usocket but haven't verified) that as of the time of writing haven't been made into official releases yet. This means that if you get Hunchentoot dependencies via ASDF-Install it will likely not work (for me, it just instantly dropped all incoming connections) - use clbuild to get the latest repository versions of the dependencies.

Earlier last year I found out about Doug Hoyte's Let Over Lambda and after reading the sample chapters promptly got excited. I visited the site recently and found out that the book was published a few months ago. This morning my copy arrived in the mail - I'll be posting a review here later. In the meantime you can obtain your own copy.

LispNYC is once again looking to participate in Google's Summer of Code program. The list of accepted organizations hasn't been announced yet, but you can start proposing summer projects on the LispNYC website.

Just found out about lisp.ru. For a non-English generalist Lisp website the forum gets a fair amount of traffic.

March 10, 2009

Cassandra of Facebook, or A Tale of a Distributed Database, part 2

This is the third in my series of blog posts about Facebook's Cassandra distributed database. Part 0 discussed the background of the problem Cassandra is meant to address, while part 1 gave an overview of the distributed systems techniques employed by Cassandra. This post is a scattering of my observations from reading the code.

The most obvious feature of Cassandra's code is the pervasive presence of checksums in communication and data handling modules. Checksums are used both to detect data and message corruption, and as a means of finding out if replicated nodes are out of sync.

Cassandra is a production system with a large deployment and the management features present reflect this. There is code for measuring node statistics and sending them off to a collection server. Nodes register themselves with both Zookeeper and JMX cluster-management services, although exactly what both of those are used for at Facebook is unclear as their cluster-management tools have not been released as free software.

The mechanism for selecting nodes for replication can be customized to use metadata about machines to choose nodes on redundant networks/power grids/backplanes/etc to increase availability. The strategy provided in the code indicates that Facebook structures their Cassandra deployment using IPv4 addresses so that all the machines with the same third octet of an address are in the same rack, and all machines with the same second octet are in the same datacenter.

The gossip protocol, besides being used for failure detection, is also used for load-balancing the cluster. Workload is measured as the number of requests per number of keys in a given time window that are handled by a node responsible for a particular interval of the key ring, and is disseminated by each node to the others. To alleviate high load, a node may move itself around on the key ring to more evenly distribute the keys between itself and a lightly loaded neighbor. The approach is an implementation of Abdallah and Le's Scalable Range Query Processing for Large-Scale Distributed Database Applications.

The gossip protocol itself is rather interesting, featuring three stages (gossip message, ack, ack-ack).

An interesting artifact is the com.facebook.infrastructure.continuations package, which has some stubs based on the Javaflow bytecode continuation transformer. You would guess this would be used for the server code since it's based on non-blocking IO, but the server code is actually hand-coded staged event-driven. The stubs in the package aren't used anywhere else, which means that some other Facebook project must be using those. I wonder what for?

The code itself can be found at http://code.google.com/p/the-cassandra-project/, but seems to be in the process of being moved to a different repository. In the meantime, there is an active git repository at http://wiki.github.com/jbellis/cassandra-dev that features community-submitted bugfixes and improvements.

To end on a light note, it is refreshing to find that the authors of Cassandra give a nod to the LOLspeak code commenting guidelines:

* 3. Set one of teh messages to get teh data and teh rest to get teh digest
// 4. SendRR ( to all the nodes above )
// 5. Wait for a response from atleast X nodes where X <= N and teh data node
* 6. If the digest matches return teh data.

Cassandra of Facebook, or A Tale of a Distributed Database, part 1

This is the second part of my examination of the Cassandra distributed database. Part 0 described the problem that Cassandra is trying to address; this post will describe the distributed systems techniques that it utilizes to that end.

The basis for this post is a presentation describing Cassandra by Prashant Malik, Karthnik Ranganathan, and Avinash Lakshman given to the Facebook staff and available online: http://www.new.facebook.com/video/video.php?v=540974400803#/video/video.php?v=540974400803 (hosted on Facebook, an account is required for viewing). James Hamilton provides a summary of the presentation on his blog.

Underlying Cassandra is the technique of consistent hashing. Using consistent hashing, the key space can be modeled as a circle of the keys' hash values, and nodes assigned to points on that circle, so that a node is responsible for an interval of the circle of key hashes between itself and its next neighbor. Nodes leaving or joining only affect their previous neighbor, and lookup has a very low communication cost. Due to its great simplicity and robustness compared to other mechanisms, consistent hashing is probably the most significant development in distributed systems in the 1990s - virtually all current peer to peer systems are based on it.

Cassandra uses a gossip protocol for determining cluster membership and for failure detection (and for load-balancing, discussed in the next blog post). Failure detection in this context means the various failure detector mechanisms that can be used to achieve consensus in asynchronous distributed systems with unreliable nodes. Given unlimited time to finish a task, it is impossible to distinguish a dead node from one that is slow - failure detectors use timing assumptions and heartbeat messages between nodes to provide an answer (with some degree of uncertainty), something that would otherwise require a synchronous network between nodes. Cassandra implements a failure detector based on Hayashibara et alia's The φ Accrual Failure Detector; the presenter points out that some "ghetto" failure detectors with simpler mechanisms do not work very well in practice.

Each interval of the hash circle is replicated on a set number N of nodes. Cassandra provides a choice of replication scheme: optimistic replication, which gives eventual consistency, or stricter approaches for writes and reads: quorum write and sync on read. Optimistic replication allows writes even if all nodes that are responsible for storing the particular key are down: the node receiving the request logs it and later hands it to the appropriate nodes when they come back up. Under the quorum write scheme, a write is not successful until at least a specified x number of the N total replicas have performed the write. It would seem that if x=N, then we get both monotonic read and write consistency. Sync on read synchronizes all replicas on each read request, which gives monotonic write consistency.

March 6, 2009

Recommended Common Lisp tutorials.

Here are a couple of CL tutorials that I've recently rediscovered and I think are worth sharing:

The first is Chaitanya Gupta's tutorial on the CL condition system. The examples provide a good guide to how restarts work and how you can use them.

Next is Adam Petersen's tutorial on building web applications using Hunchentoot, CL-WHO and Elephant. It shows all the necessary basics from beginning to end, and I don't think I have to state twice that I like the no-framework approach to web development Adam is demonstrating. There's even a little use of Parenscript to generate JS. If someone asks for a tutorial on how to build web applications using Lisp, this is the one I'd refer them to.

March 5, 2009

Cassandra of Facebook, or A Tale of a Distributed Database, part 0

One of the most natural and popular tendencies in the application of distributed systems theory is the construction of distributed databases with the hopes of providing improved availability and scalability (in the number of requests processed per second) via data replication. While distributed queries and transactions are obviously needed for some kinds of databases, what underlies every database is the ability to read and write a sequence of bytes to some place denoted by another sequence of bytes. In database jargon this is commonly called a storage engine, while in a distributed system this can be seen as a type of distributed shared memory.

Any kind of database providing access to more than one client at a time must deal with concurrency issues and hence provide some kind of consistency model for data access and manipulation. This can make constructing distributed databases either easier or harder, as the underlying consistency model of a chosen distributed implementation can be exposed directly as that of the database, or several distributed mechanisms have to be combined in complicated ways to provide the properties mandated by the stated consistency model of the database.

Choosing a given consistency model involves large trade-offs in the performance, simplicity and ease of maintenance of the distributed database implementation. Ideally, all the nodes of the system would play the same role, and adding new nodes would incrementally improve the availability, capacity, and number of requests the system can handle. Because of the communication and synchronization overhead necessary for implementing stricter consistency models, the improvement in those aspects as a function of nodes in the system displays the quality of diminishing returns (and may in some cases provide negative ones).

Fortunately, many applications do not require strict consistency models of their database. The first to go is usually the requirement for multi-operation transactions with the ACID property. This provides the greatest improvement in potential scalability as both the locking and multi-version timestamping mechanisms for implementing transactions degrade in performance with increased contention between nodes.

The fad du jour in distributed databases is the distributed non-transactional non-relational database. It seems there is a new project implementing a distributed key-value, document-oriented, or "loose schema" database announced every couple of months.

Despite the hype, only a handful of these projects have large deployments. Even rarer is coming across someone who has worked on more than one such database. These facts alone make Facebook's Cassandra stand out. Cassandra was developed and is currently deployed inside Facebook to handle the user messaging feature. One of the developers of Cassandra, Avinash Lakshman, has previously worked on Amazon's internal Dynamo distributed database.

Fortunately for those of us who do not work on messaging at Facebook, the Cassandra source code has been released under the Apache 2.0 free software license. This is the first in a trilogy of blog posts examining Cassandra's use of modern distributed systems techniques and my observations from sitting down and reading the code.

Peer-to-peer synchronized simulations

It seems it's been a while since I wrote anything about distributed systems. The next few posts should hopefully make up for that.

To kick things off with some light observations, I've recently (thanks to a whole bunch of people in my del.icio.us network) come across this 2001 article describing the multiplayer system of Age of Empires.

The system is a peer-to-peer simulation where input from each player in the form of commands is synchronized across all nodes in discrete "game turn" steps, whose real-time duration is calculated as a function of latency between the nodes and the framerate at which the nodes can render the simulation.

It's interesting to compare the Age of Empires system to that of Croquet (I've blogged about Croquet before). Like Age of Empires, Croquet is a peer-to-peer system. The TeaTime architecture behind Croquet can be thought of as a distributed object simulation where messages between objects are what gets synchronized across nodes.

Where Croquet and the AoE multiplayer system differ is in how time and synchronization is handled. TeaTime extends Reed's multi-version timestamping with the notion of pseudo-real time and uses that notion to enable deadline-based scheduling for message processing, with synchronization being achieved via two-phase commit. This allows the decoupling of simulation rendering speed from message synchronization. Unlike the AoE system, this approach is unaccomodating to high-latency nodes.

One thing I found interesing about the AoE article is the large number of synchronization bugs (a lot of them relating to PRNG generation) that the developers encountered.

February 26, 2009

CL-USERs map

If you have a Google account and use Common Lisp, there is a CL-USER Google map you can annotate with your presence. This can be a useful tool in organizing local user groups.

February 23, 2009

ILC 2009

As a public service announcement, I'd like to remind readers that the early registration period for the 2009 International Lisp Conference, to be held at MIT, ends in a week. Register today! The conference is a month away, I am quite excited about it.

GWT and deferred binding

I was going through last year's Google I/O conference videos and noticed a few GWT-related ones. One in particular seemed interesting: Faster-than-Possible Code: Deferred Binding with GWT by Bruce Johnson.

The technique that GWT dubs "deferred binding" consists of generating feature-specific output code (JavaScript) from a common codebase (Java for GWT, Lisp for Parenscript) and serving it up under different URLs so the resource are cached properly.

I first encountered the idea in the spring of 2007 when Daniel Gackle proposed it as a way of efficiently handling the generation of feature-specific JavaScript code from Parenscript, and we implemented it in a few days.

The system would generate several pages from a common definition written in CL-WHO/Parenscript (we used it to generate browser-specific code and profiler-instrumented code, so in all we'd get a cartesian product of (ie, ff) x (regular, profiled) as output), and serve them up under different URLs that also incorporated a version number (via a mechanism that linked source files to generated resources and tracked the file modification date). This approach generated perfectly cacheable HTTP resources and ensured trouble-free application upgrades.

Something interesting that I noticed when developing the feature-specific code generation mechanism was its resemblance to context-oriented programming: each feature acts as a layer which affects the way that the compiler produces code.

Besides browser-specific code generation, in GWT the technique is used for generating locale-specific resources, and by virtue of its implementation, to provide an extremely obtuse way to emulate Java's broken metaprogramming facilities (see http://www.zenika.com/blog/wp-content/uploads/2007/08/tutorial-binding-en.pdf, for example).

In the intervening time I have thankfully learned a little more about programming for the web, and have come to the inevitable conclusion (if I had started reading comp.lang.javascript sooner, I could have avoided making this mistake - if you're not reading that group yet, start now) that using the approach to generate browser-specific code is fundamentally flawed and will almost certainly ensure that your code will break on browsers that you have not developed for. Working to add support for new browsers to this scheme will not only waste an incredible amount of time that you would not have had to spend at all otherwise, but will actually make your code more brittle and harder to maintain.

The only viable approach to dealing with differences in browser capabilities is feature detection. I can only say that I am glad that Parenscript is not pushing a "framework" on anybody, so my ignorance only impacted one project. Generating feature-specific code is not in itself a bad idea - GWT uses it to also generate locale-specific resources, with the corresponding bandwidth savings and cacheability advantages. I can only hope for the sake of their users that the GWT developers repent and change their stance on generating browser-specific code.

February 16, 2009

President's day Parenscript release, and a little on how I deploy web apps

It is President's day in the US of A, and that means a new release of Parenscript. Be aware that this one may break your code.

In other news, a couple of weeks ago I decided to sign up at stackoverflow, a community Q&A site for programmers. It has member-driven moderation, good search and tagging facilities, but a much wider scope and lack of (for lack of a better term) "narrative" than Usenet or mailing lists or message boards. The first means that spam, trolling and off-topic messages are kept under control, but the third means you can't participate in the site like you do Usenet, which will hopefully be offset by the second, which means that you can use the site to find answers more effectively than searching Usenet archives and leave your contribution to answering questions that you have some knowledge of.

So far I've answered one question about deploying Lisp web apps. I'm thinking of expanding it in more depth and adding examples and turning it into a blog post ("article" in the old media parlance). Which naturally leads me to remark on my own online community participation: I've stopped reading Usenet and participating in community sites a few years ago, and now follow blogs exclusively. Narrative, personalization, and Internet etiquette.

February 9, 2009

Web browser fun

The diagram below summarizes my recent cursory survey of web browsers. I think the one important conclusion that can be drawn is this: Webkit will be critical in the future. I know at least one person who doesn't have a computer, but uses Facebook from her mobile phone. There will be more and more people like that. Another important factoid: IE6 isn't dead. IE Mobile 6 will use the JavaScript implementation of IE8, but the layout engine is based off of IE6. That will probably induce nausea in most web developers, but it makes me glad I still take care to develop my web apps to run in IE6.



Other fun things:



The repository version of Parenscript will probably break your code, because your code probably deserves to get broken.



I've released a new version of uri-template, which fixes a bug in how URI-encoding was being performed.

February 2, 2009

Learn programming through JavaScript

I found a link to Eloquent JavaScript on some website shortly before posting the last blog entry about PCall, and just realized that it's written by the same Marijn Haverbeke. Interesting coincidence.

From a quick skimming, the book itself appears aimed at those new to programming. I like the style and the exercises used, and the coverage of JavaScript appears good without reading like a spec. The books also covers everything you need to know to get started building web applications. If someone wanted to learn programming by jumping straight into the web (a somewhat prudent thing to do in this day and age) I would recommend this book alongside SICP, it's certainly a lot better than most of the other JavaScript learning resources I've encountered.

In other Marijn Haverbeke-related JavaScript news, he also has Lisp implementations of a JavaScript parser and a JSON library. The former provides another way besides jwacs to get JavaScript code into Lisp, while the latter provides an alternative to CL-JSON.

Parenscript tricks and parallelism

The new version of Parenscript features a user-definable obfuscation facility. Among other amusing things, it can be used (due to JavaScript's under-utilized support for Unicode identifiers) to make your code Asian:


(ps:obfuscate-package "LAMBDACHART"
(let ((code-pt-counter #x8CF0)
(symbol-map (make-hash-table)))
(lambda (symbol)
(or (gethash symbol symbol-map)
(setf (gethash symbol symbol-map) (make-symbol (string (code-char (incf code-pt-counter)))))))))

LAMBDACHART> (ps (defun foo (bar baz) (+ bar baz)))
"function 賱(賲, 賳) {
賲 + 賳;
};"


Unrelated, I recently found Marijn Haverbeke's PCall library for parallelism in Common Lisp. The library provides futures (called 'tasks') as parallelizing mechanism, and thread pool (the library is based on bordeaux-threads) management facilities to tweak how the futures are actually executed.

Unlike MultiLisp, which implemented the same futures-based parallel model, there is no macro provided to evaluate a function's arguments in parallel before applying the function to them. That seemed to be a popular facility in the parallel research Lisp systems of the 80s, probably because it is a no-brainer once you consider the Church-Rosser theorem, however upon some reflection and a little coding that construct proves to be not very convenient.

I think the futures approach to parallelism is the most widely useful model available today. It shares all of the conceptual benefits of its cousin delayed/lazy evaluation: futures are declared and used explicitly in the code, without forcing (pun fully intended) any contortions in the control flow of the code using those futures. If you can write a function, then you can define a task that can be executed in parallel.

The model doesn't handle concurrency control beyond the synchronization provided by joining/forcing the future, so if your tasks share state (although you should be writing your code to do the synchronization in the code making and consuming the tasks, so that they don't share state), you'll need to do the synchronization yourself (this is where you take advantage of locks provided in bordeaux-threads).

One interesting thing about the library is Haverbeke's extreme pessimism about native thread overhead (the default thread pool size is 3). On many systems that is certainly justified, but apparently some half-decent OS implementations exist. I'm interested in doing some benchmarks with SBCL using NPTL threads on an AMD64 box to see what kinds of numbers are reasonable.

January 21, 2009

New Parenscript release!

Details on the project page: http://common-lisp.net/project/parenscript/



This one has been a while in the making, but in the meantime many subtle but important issues have been thought out and addressed. I feel that the project now has a much clearer direction for future development, particularly as a basis for advanced web development tools that are going to be a step beyond anything else out there (announcement coming soon).