October 12, 2012

Enforcing required slot values

I was reading a blog post by Robert Smith on ideas for future directions for Lisp type systems, where Robert mentioned the following trick to enforce provision of required slot values for structures and classes:

(defstruct foo
  (slot1 (error "Must provide value for slot1")))

That's something I haven't thought of doing before, I hope you find it useful as well. The rest of Robert's article is well worth a read.

September 17, 2012

Friends, erasure codes, and symmetric encryption for password-based private key recovery

Following up on previous ideas on P2P social networks, one idea I recently had about peer-assisted private key recovery was the possibility of using erasure codes with symmetric encryption. You'd encrypt your private key with your password, use an erasure coding scheme to break it up into chunks so that you'd need some N > 1 chunks to recover the encrypted key, send out a chunk to a friend when you "friend" them.

When you're at a public computer, your friends would only send a chunk to you if you've managed to authenticate with them (using a zero-knowledge password proof, perhaps SRP). When you've collected enough chunks, you get the encrypted key, and decrypt it with your password.

You'd have to trust your friends not to collude to share chunks with each other, and even if they do, they still have to guess your password.

September 11, 2012

CLiki news

cliki.net is finally up on the cliki2 software (the ALU wiki has been running it for a while).

CLiki2 features real user accounts, effective anti-spam protection, and much improved article versioning (you now get graphical diffs like Wikipedia, among other things). Change notification for the whole wiki and for each individual article is provided by ATOM feeds. There is now code coloring support. Pages can be deleted for real. The layout and ATOM feeds have been designed to be accessible, and work great in console browsers.

The biggest casualties in the move were latin1 characters (some pages were a mix of utf-8 and latin1, and latin1 lost). In particular, if your name contains a lot of accents, I aplogize.

So grab yourself an account, read up on tips for writing CLiki pages, and contribute some information about Common Lisp libraries or other topics.

If you want to write about Lisps other than Common Lisp, the ALU wiki is the place. Right now it is a little sparse and could use more contributions.

Bonus link: Check out Takeru Ohta's github account for some neat Common Lisp software: https://github.com/sile

August 31, 2012

All Tomorrow's Startups

One of the projects mentioned in Ted Nelson's 1974 Computer Lib/Dream Machines was the Berkeley computer bulletin board system Community Memory. I found the following passage from the Wikipedia article about Community Memory very timely for the current startup scene:

Periodically directories of recently added items or of musician-related messages would be printed out and left there. In other terminal locations, users sought out complete strangers to assemble car pools, organize study groups, find chess partners, or even pass tips on good restaurants.

May 8, 2012

Finding bugs in Quicklisp libraries

One of the great things about Common Lisp is the variety of implementations and the scope of deployment platforms and performance characteristic trade-offs (run-time vs compile-time vs memory size) they encompass. This is also one of the things that complicates library development. Testing on any significant combination of implementations and platforms is not practical for most Free Software library authors.

cl-test-grid is a project that provides automated, distributed testing for libraries available via Quicklisp. You download cl-test-grid on your machine, tell it about which CL implementations you have available, it runs the test suites from many libraries in the current Quicklisp release, and sends the test results to a publicly available report as well as a public bug tracker. The report details test failures by Common Lisp implementations, platform/OS, and library version.

The report is a great resource for library authors and implementation maintainers. If your library is distributed via Quicklisp and isn't tested by cl-test-grid yet, Anton Vodonosov provides some tips for making your test suite cl-test-grid friendly. If you're looking to contribute to Free Lisp software, one of the best ways is to get cl-test-grid and run the tests. See if there are any failures in the test suites, and report the bugs to the library maintainers and help find a fix.

February 2, 2012

Crypto tools for P2P friendnets

Some reading up on crypto, I came across some relevant but not widely known cryptography techniques that can benefit P2P friendnets:

It turns out what I was thinking about in terms of friend logins without revealing passwords is a widely researched area called zero-knowledge password proofs. In particular, one application of zero-knowledge password proofs is encrypted key exchange, the patent for which just expired at end of 2011.

Even more directly applicable, there is a 2009 paper by Michel Abdalla, Xavier Boyen, Céline Chevalier, and David Pointcheval on how password-based public key infrastructure can be built using multiple nodes ("how your friends can help you remember your keys" - also see the slides).

One thing I haven't seen considered yet for P2P publishing is broadcast encryption. Originally developed with the goal of digital restrictions management for centralized distribution, broadcast encryption seems useful as a way to manage "unfriending" people in a P2P social network.

On the web browser crypto front, OpenPGP is currently soliciting help for a JavaScript port.

January 30, 2012

Continued Confusion

People have trouble understanding continuations, and this is no surprise when you realize that the term "continuation" is overloaded in all three senses: as a word, as a concept, and as a programming construct.

The confusion comes from two fields where continuations are used as a concept: compiler intermediate representations (where Continuation-Passing Style (CPS) is one possible form of intermediate representation), and First-Class Continuations (FCC), which is a metaprogramming technique for manipulating the stack of a program.

In terms of being a programming construct, continuations only exist in FCC. When it comes to CPS, a continuation is a concept only. In CPS, this concept is easy to understand: a continuation is the next step in a computation - the next instruction in the instruction stream.

For FCC, the concept consists of two parts: saving the current computation, and resuming a saved computation.

The main cause of confusion around FCC is that the First-Class Continuation programming construct is traditionally represented as a function (this has to do with the fact that FCC as an idea was derived from work in CPS). But a First-Class Continuation is not in any way a function. As Christian Queinnec points out in Lisp in Small Pieces, a First-Class Continuation is more like catch/throw:

Saving the current computation is like setting up a catch block, and resuming that computation is like throwing a value to that tag - anything that happened between that catch and throw is forgotten. The thing that makes a First-Class Continuation special is that you can throw to it anytime from anywhere, and you end up in the corresponding catch, with whatever you were doing previously forgotten.

Marc Feeley has proposed an alternative interface for First-Class Continuations that makes invoking First-Class Continuations explicit, which is implemented in Gambit Scheme.

So finally we get to continuation as a word. As a word, "continuation" is used in three contexts:

  1. In Continuation-Passing Style as a conventional way of calling the next step in a computation.

  2. In First-Class Continuations as a name for saved computations.

  3. To indicate the next step in a computation when looking at a particular point in any given piece of code.

Christopher Wadsworth coined the term "continuation" for a way to formally reason about jumps/gotos, but the term can also be used informally when talking about a particular point in a piece of code. If you mentally use a different word in each of the three contexts, I think a lot of the confusion surrounding continuations can be cleared up. Something like the following might work:
  1. "next"

  2. "saved stack"

  3. "continuation"

January 24, 2012

WebRTC and the File API

Two interesting developments in HTML5 I've recently been made aware of are WebRTC (thanks to Manuel Simoni) and File API (thanks to Rich Jones). The RTC part of WebRTC stands for Real-Time Communications, but it's basically a way to provide TCP/IP sockets for a web browser while pretending you're not providing TCP/IP. The top-layer part consists of various video and audio codecs, but the basic gist is a way of providing a bidirectional channel for web browsers to communicate with each other. This is accomplished with NAT traversal in mind (the functionality for traversing NATs and proxies in the reference implementation is provided by the very handy looking libjingle library that seems to do most of the common NAT traversal tricks).

One obvious use of WebRTC is providing audio and video chat in webpages without Flash (WebRTC specifies audio/video input, real-time codecs, and jitter buffers). Another obvious use is to develop a BitTorrent-like distribution protocol for videos; something like this should really cut down on YouTube's bandwidth bills.

The File API provides the potential to use WebRTC for any kind of P2P file sharing. This is I think the most exciting potential of WebRTC.

To step back in terms of commentary, the web browser has now come full circle to being a very weird virtualized operating system, whose APIs rival Windows in size, idiosyncrasies and complexity. The need for application hosting declines - both the application and "putting your data in the cloud" can now be handled by a Freenode or Tahoe-LAFS-like WebRTC library. What's interesting is that friction surrounding development and deployment should push new web applications away from a centralized server and towards the P2P approach. Unlike fully hosted apps, there is no extra effort involved in making an offline version. Server rent/lease and administration costs are reduced considerably by offloading as much data storage and computation as possible onto the browser. This latter possibility in particular should enable many startups to avoid the capital requirements needed for scaling up services that depend on the network effect, such as most "social" things. I don't like to use buzzwords, but this looks to be disruptive.

January 18, 2012

Upcoming presentation about Parenscript

I'm going to be giving a talk about Parenscript to the Montreal Scheme/Lisp Users Group on Thursday, January 19 (meeting details here).

The slides I'm going to be using are here, and a list of links referenced in the talk is below. The last time I gave a presentation on Parenscript was to LispNYC in 2007. Parenscript has received a huge number of changes and improvements since then, and continues to be the best language/compiler to JavaScript and one of the best tools available for web application development. What's also new since 2007 are libraries and tools that extend Parenscript: Red Daly has added CLOS and the Common Lisp condition system to JavaScript as a Parenscript library, and there are now several options for interactive development with SLIME in your browser.

Links:

January 17, 2012

Compiling by simplifying

When you read the original Lambda Papers carefully, one thing you notice is that Steele didn't view Scheme as just a programming language, but also (and I would argue, primarily) as a philosophy of constructing compilers.

This philosophy made clear and transparent the issues of translating a high-level language into Von Neumann-style register machine code, and provided mechanisms to radically simplify the transformation. Continuation-passing style reified issues of control transfer, temporary stack values, and register use. Function arguments become registers, function calls - jumps. Difficult issues like nested scopes, first-class functions, closures, and exceptions become easy.

The other great, simplifying thing about Steele's philosophy is the self-similarity of the code as it undergoes transformation. The input of each stage of the compiler is a Scheme program, and (until machine code generation), the output is a simpler Scheme program.

Many compilers miss out on the self-similarity aspect of Scheme in two major ways. Those compilers that do follow continuation-passing style usually implement the CPS portion of compilation using completely different data structures than the other portions. This adds needless boilerplate code and makes the CPS portion look more complicated than it is. This needless complexity shows up in some ML compilers - in particular, Andrew Appel's otherwise excellent and highly recommended Compiling with Continuations, and causes people to remark that implementing SSA is not all that much harder than doing CPS (Olin Shivers disagrees).

The subtler, but even more serious omission is ignoring Steele's advice to implement the parts of the language above the intermediate layers as macros. This spreads the complexity for implementing features like classes and objects throughout the different portions of the compiler, starting with the parser.

Following Steele's techniques, building a Scheme compiler that is several times faster (and many times smaller) than the compilers/virtual machines of popular dynamic languages like Python and Ruby is easy. Scheme 48 was originally written in 48 hours. If you don't have that much time, Marc Feeley can show you how to build a Scheme compiler in 90 minutes. And if you're really strapped for time and don't want to sit through CPS theory, Abdulaziz Ghuloum shows you how to dive right into generating x86 machine code in 10 pages.

January 11, 2012

The personal computer you've never heard of

I was watching Dan Ingalls' Seven (give or take) Smalltalk Implementations talk given at Stanford in 2005 on YouTube, and around the 43 minute mark Ingalls talked about something I consider shocking:

In 1978*, a year after the introduction of the Apple II, Xerox PARC built a portable computer with a 7 inch, 640 by 480 resolution bit-mapped, touch-screen display based around a commonly available 1 MHz, 16-bit microprocessor with 128 KiB** of memory running Smalltalk-76. The computer was called the NoteTaker.



The NoteTaker ran Smalltalk as fast as the Alto (the Smalltalk-76 VM (6 KiB) actually executed bytecode twice as fast on the 8086 as on the Alto, but the memory bus was much slower, making interactive performance feel similar).

I always thought the 8086 was extremely underpowered and good only for DOS and terminals. In hindsight, it's mind-boggling how long it took x86 PCs to catch up with the Macintoshes and Amigas of the 1980s.

* Note that Wikipedia and other sources give the NoteTaker's date as 1976, but this is likely the date when design started, as the 8086 design also just started in 1976 and the processor did not ship until 1978. The NoteTaker manual is also dated December, 1978.

** The NoteTaker manual specs the machine at 256 KiB of memory.

*** The current Wikipedia article about NoteTaker claims this computer would have cost more than $50,000 if sold commercially (presumably in 1978 dollars). Assume that the CRT, floppy disk, power supply, keyboard and mouse cost $2,000 (a whole Apple II system with 4 KiB memory retailed for $1,298.00 in 1977). The 8086 originally wholesaled for $360. According to the NoteTaker manual, the NoteTaker had a second 8086 which acted as an I/O processor but was "also available for general purpose computing tasks." It would have been entirely possible to replace the I/O processor with a much cheaper CPU. Looking again at Apple's price list, a 48 KiB Apple II board retailed for $1,938, while a bare-bones 4 KiB one sold for $598, which gives $31 per KiB. So 128 KiB would retail for $3,900 and 256 KiB would retail for $7,800. It certainly would have been possible to produce and maybe even sell the NoteTaker with 256 KiB of memory for less than $15,000. Note that a few years later, Kim McCall and Larry Tessler made a subset of Smalltalk-76 that ran in 64 KiB of memory, but with the full system image only about 8 KiB of memory was available for user programs.

**** The NoteTaker also came with an Ethernet interface.

PS - While researching this article, I also stumbled on another PC you've probably never heard of. The Kenbak-1, similar in operation and capabilities to the MITS Altair 8800 (both came with lights, toggle switches, and 256 bytes of memory), was sold via advertisements in Scientific American for $750 in 1971, four years before the Altair.

What is ClearSky going to look like as an application?

ClearSky is going to have to be as simple as possible to use. Facebook-level simplicity. The only unavoidable extra requirement is that the node software is going to have to run as a binary on your own computer. The installation process needs to consist of just downloading a single binary. Once you have it, no more extra installation steps.

The first time the binary is run, it opens a web browser pointing to a page that asks you to choose a nickname and password (the binary will have a web server to present the local interface). In the background, it's going to generate a private/public key pair and set up the filesystem directory where you store data for that account. After that, it will let you look up friends online. I'm undecided about how this step will work. There are a lot of alternatives for federated, centrally located directories: just build a web one, XMPP (integration with chat seems like a good idea), IRC, etc. I don't know a lot about how the distributed options work, but they are out there and seem to work ok (Freenet, GNUnet, etc.). It seems like a good idea to make this part modular, but present it behind a consistent and easy to use interface.

The binary will provide an HTTPS port to the outside world to let your friends log in remotely (see previous discussion). If you're behind a NAT gateway, it's going to take some extra work - both the NAT-PMP and IGD protocols will need to be supported.

In general, UDP hole punching needs to be present to let two ClearSky nodes talk to each other, although it alone is not enough to provide remote logins from the outside. UDP hole punching needs an accessible third party to be present - using either a friend's computer that has an unrestricted connection, or a central server if no such friends are available. Using unknown third parties is problematic because it relies on either volunteers (every ClearSky client can be a volunteer, but this opens up possibilities for surveillance) or your mirror provider.

All traffic between you and your friends goes over SSH. Messages and files are cryptographically signed to ensure authenticity, but encryption during transport is provided by SSH. The ClearSky client should encrypt data before writing it to disk to ensure privacy even if the hosting computer gets lost or stolen.

Apps are going to be started in a separate process (via fork(), or by running the same executable with different options) for extra security, stability, and to let the OS handle resource control. Games, content libraries, chat, voice, calendars etc. are all possible apps. One example app that could be built but that's not possible with Facebook is a Tahoe-LAFS that would let you trade harddrive space with your friends to securely and redundantly back up your private data.

Apps will communicate with the ClearSky process via sockets and the file system. Security is going to be provided by language-level virtualization. The app runtime will prompt the user when the app wants to access filesystem folders or other resources. Things like video and audio will ideally be handled by HTML5, if not then by Flash. It's undesireable to have the app display native UI or use the OS multimedia or other capabilities directly - this shouldn't be included in the platform unless there's a good concrete demonstration that it's needed. OTOH, having an app be able to manage its own sockets may be a good idea.

Common Lisp is a good language/runtime to base the ClearSky platform on. A Common Lisp binary can run all the basic ClearSky functions (web server, crypto, discovery, networking, SSH) in the same process, provides excellent support for language-level virtualization (and of course already comes with a compiler), and has very good support for hosting and virtualizing other programming languages - of the popular scripting ones, CLPython and CL-JavaScript are currently available, and new ones are easy to implement: CL-JavaScript was written by 3 people (Alan Pavičić, Marijn Haverbeke, and Iva Jurišić), CL-Python by just one person (Willem Broekema).