home .. forth .. colorforth mail list archive ..

Re: [colorforth] Extending the character set - Take II



---- Original message ----
>Date: Tue, 30 Mar 2004 04:59:42 -0800 (PST)
>From: Bill Parker <hodjiii@xxxxxxxxx>  
>Subject: [colorforth] Extending the character set - Take II  
>To: colorforth@xxxxxxxxxxxxxxxxxx
>
>I'm still on my personal crusade to kick the
>ColorForth 'space' character out of it's premium 4-bit
>spot.  I have looked at quite a bit of ColorForth code
>samples (from the net) and have yet to find an
>instance where space-terminated compression results in
>a smaller compressed data stream than simply appending
>a 'last character' bit.
>
>Here is the latest idea that I am playing around with
>for an alternate use of the 0 000 encoding slot:
>
>Use the 0 000 as a 'shift' character which changes the
>interpretation of the character which follows.  a-z
>become A-Z and the other encodings fall out as
>follows:
>
>1101 000 K  [was k]   1110 000 <  [was 8]
>1101 001 Z  [was z]   1110 001 (  [was 9]
>1101 010 J  [was j]   1110 010 '  [was 1]
>1101 011 #  [was 3]   1110 011 _  [was -]
>1101 100 $  [was 4]   1110 100 )  [was 0]
>1101 101 %  [was 5]   1110 101 >  [was .]
>1101 110 ~  [was 6]   1110 110 "  [was 2]
>1101 111 &  [was 7]   1110 111 |  [was /]
>
>The entire bank of 1111 encodings is open so they are
>available for the user to define their own
>application-specific symbols.
>
>Note that the 'shift' character may not be the last
>character within a 32-bit word (since it is then
>indistinguishable from the normal zeroes which pad out
>the end of word).  In comments, this means that the
>shift character must 'travel' to the next 32-bit word
>when it is discovered that the character it applies to
>will not fit within the current 32-bit word.
>
>The addition of edit-time complexity (and perhaps a
>little at load/display time as well) doubles the
>available character set size.  Entering the 'shift'
>character becomes a mode that modifies subsequent
>menus the user is presented so this fits nicely into
>the existing 24-key user interface.  (In fact, the
>mismatch with the user interface is why I ultimately
>abandoned my earlier attempts at extending the
>character set.)
>
>And, perhaps the nicest feature of all.  An extended
>character set ColorForth is completely compatible with
>existing ColorForth source blocks.  None of the
>existing encodings are alterred so you can freely
>import standard ColorForth source blocks.  You may
>also freely export source blocks which do not contain
>any 'shift' characters to a standard ColorForth system.

Compatability should not have much weight. With an 
appropriate program, converting all existing colorForth 
source to a new encoding would take less than a day.

If we want to extend the use of the colorForth character set, 
we should do that and design one which matches the simplicity 
of decoding of the current set.

For example, with the current colorForth character set, you 
would not have been able to compose the above message. Do we 
wish to use colorForth in composing email, how about the web? 
If so we should rexamine the design.

Perhaps as Unicode defines a bijection between characters and 
numbers, we define a bijection between unicode numbers and 
colorForth huffman coded characters.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com