home .. forth .. colorforth mail list archive ..

[colorforth] Adventures with Building Applications

Subject: [colorforth] Adventures with Building Applications
From: "David J. Goehrig" <dave@xxxxxxxxxxxxxx>
Date: Tue, 19 Aug 2008 13:21:28 -0400

Hello All,

I have a project that has been in development for several years now,that supports a number of web applications. The project's customer baseranges from multinationals to startups, and has pretty much paid mybills for the past 5 years. The code base has been ported from C++, toOcaml, to C. It has used Perl, Python, Ruby, Lua, and Javascript asembedded interpreters. The latest version of the project consists of 7klines of C, and a cut down version of Mozilla's SpiderMonkey (whichitself consists of 86k lines of C).

This web application server includes support for HTTP/1.1 (client &server), SSL, SMTP (client & server), SMPP (client), and IRC styleprotocol, and SQL through PostgreSQL. Unlike most web servers, it has alarge number of resident "bots" that process requests, chat to users,and do general house keeping. All of the functionality is currentlyaccessible through Javascript and doesn't require any special trainingbeyond what a general Flash programmer knows.

But time doesn't stand still, and a recent project for a largemultinational put serious strain on the system when the numer of userswas an entire order of magnitude greater than the original projectproposal! The biggest stumbling block has been the 86k lines of C codethat run the Javascript engine, the same one found in Firefox. Afterreading through Webkit, and the new Tamarind engine from Adobe (with its250k lines of C++ code & Javascript JIT), I pretty much gave up on usingany of the existing engines. So I have begun the 4 major rewrite ofthe system. Over the past few months, I've been evaluating a widerange of technologies. I've played with customizing several smalltalkVMs, written PEG based language translators in several differentlanguages, prototyped versions in the mainline scripting languages, andcreated 4 custom VMs in C and intel assembler. But ultimately, I wentback and read through cmForth.blk, and the source listings forcolorforth and got some inspiration.


The latest VM has a simple opcode set which consisted of the characters:

   !"#$%&'()*+,-./0123456789:;<=>?@[\]^_`abcdef{|}~

And for inspiration, this VM ran opcode that were very similar toChuck's colorforth VM with a few minor modifications. Registers wereallocated:


; %edx - top of stack  (doubles as the B register)
; %eax - next on stack
; %ecx - counter/utility
; %ebx - instruction pointer (now a utility register)
; %esi - data stack pointer
; %edi - memory register ( the A register )
; %esp - return stack pointer
; %ebp - free space pointer (aka here)

And the VM opcodes did slightly different things than they do in Forth,but there always were equivalents. For example while ! is xor, $ does a!a+. Also some words were repurposed:


[ pushes ebp, and starts compiling
] stops compiling
( immediately switches to interpreter context, and compiles
) evaluates the interpreter context
{ pushes ebp
} compiles a 0 cell

: takes the next word in the input buffer, and binds it to the value onthe top of the stack.

Also any word not found in the dictionary, just leaves the address ofthe token on the stack. These changes made a Forth-like language forthe VM very simple, and the implementation of the VM and bytecodeinterpreter, was under 500 lines of assembler. The trick being eachopcode was aligned on a 32 byte boundary, and simply vectored to thebyte [ebx + 8000] address. My favorite op codes are 0-9a-f whichmultiply the TOS by [base] and add the values they represent.

One of the biggest changes from Forth is :. Unlike traditional colondefinitions the VM's Forth-like language would look like:


[ _ * ] : square

Where [ started compiling, pushing the address of the code on the stack,_ compiles a dup, * compiles a multiplication, and ] turns offcompilation, : then bound the value pushed on the stack by [ to the wordsquare. And everything else works pretty much like you'd expect. Theother thing that makes

life really kinda neat was:

{ 0 : foo , "narf" : bar , }

with a minor tweak to : to mean , when between { } produced datastrutures that look a lot like JSON, only with the value : key notationrather than key : value. At this point, I thought that I was almostdone with the rewrite, I'd simply build a parser that reordered theJavascript into a postfix notation, and run it through the VM, andpresto I'd be able to support most of the code the developers I'veworked with have written over the years.

But why stop there! I've got a VM that looks a lot like Chuck's VM, andI know that he's got an optimizing native compiler for his code, so whynot remove the whole redundant bytecode step? So I kept the byte codes,changed the definition of next to just be ret, and added a quick lookuptable of the lengths of each VM instruction, and rewrote the _compileblock to just copy VM instruction inline, compile literals as a dup; movedx, 1234 and all dictionary calls as call instructions. Now I switchedmy VM from a bytecode interpreter to a native compiler! A few moretweaks, and in 630 lines of assembler, I now have a native compiler thatoptimizes tail calls, optimizes out some stack juggling combinations,and can still switch between interpret and compile modes (literally,compile & run vs. just compile).

And that's where it stands today. I am currently, rewriting myjavascript -> forth parser, in javascript, and playing around with acolorforth -> javascript compiler in javascript as well. Colorforth inyour browser? Sure why not? Your browser in colorforth? Sure whynot? Chuck was right when he said a browser was simple, it just needsto be done :) By implementing javascript in forth, and forth injavascript, either can run anywhere.

Going forward, I fully expecting to have a Forth/Javascript enginepowering version 4 of this web application server project. I amcurrently planning on rewriting the 7K lines of C in a combination ofForth and Javascript (which compiles to Forth so ultimately it is Forth,but my web guys don't need to know that), and expect to have the entirecode based reduced to under 4k lines of code (down from 94k!).

In the future, since I've got the code to boot the VM from disk, I'malso planning on writing an Ethernet device driver for the couple cardsI use, and a TCP/IP stack, so I can run this server on bare hardware.With the availability of virtualization software like VMWare,VirutalBox, QEMU, etc, I can easily see a migration path fromFreeBSD/Linux/MacOS X to plain bare hardware.

Looking back at this development process, I could have just used theColorforth 2.0a to implement the VM, but without the equivalent of JayMelvin's shadow block listings of cmforth for the first 18 blocks of CF,I wasn't comfortable implementing it on top of so much black magic.Building a native javascript compiler is hard enough without knowing howthe internals of the compiler work :)

Anyways, I hope to do a postmortem for this project in a few months.The source code and executables will most likely be available by the endof the year under some form of open source license. I hope thisinspires some of you to dust off your CF images, and build some apps.

Dave Goehrig

--

David J. Goehrig


Email: dave@xxxxxxxxxxxxxx


---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com

Follow-Ups:
- Re: [colorforth] Adventures with Building Applications
  - From: vaded

Previous by thread: Re: [colorforth] A monthly Ray-port.
Next by thread: Re: [colorforth] Adventures with Building Applications
Index(es):
- Thread