home .. forth .. colorforth mail list archive ..

[colorforth] Unpack/Huffman coding in colorForth


Hoy Folks,
Have the Huffman codes changed in historic times?

I have had some success in automatically extracting names from a
colorforth image, and use those names in the disassembly.
But `` 1,  '' wants to be seen as `` z, '' and 2, as j, .

---------------------------
( 0000,0645 )   :X_,    MOVI|X, CX| 4 IL,
( 0000,064A )   :L0000,064A    MOV, X| T| DX'| MEM| 0794 L,
( 0000,0650 )                  MOV, X| F| AX'| ZO| [DX]
( 0000,0652 )                  MOV, X| T| AX'| ZO| [SI]
( 0000,0654 )                  LEA, DX'| ZO|    [DX +1* CX]
( 0000,0657 )                  LEA, SI'| BO| [SI] 4 B,
( 0000,065A )                  MOV, X| F| DX'| MEM| 0794 L,
( 0000,0660 )                  RET,
( 0000,0661 )   :X_z,    MOVI|X, CX| 1 IL,
( 0000,0666 )                  JMPS, L0000,064A RB,
( 0000,0668 )   :X_j,    MOVI|X, CX| 2 IL,
( 0000,066D )                  JMPS, L0000,064A RB,
( 0000,066F )   :X_3,    MOVI|X, CX| 3 IL,
( 0000,0674 )                  JMPS, L0000,064A RB,
( 0000,0676 )   :X_;    MOV, X| T| DX'| MEM| 0794 L,
( 0000,067C )                  SUBSI, R| DX| 5 IS,
( 0000,067F )                  CMP, X| F| DX'| MEM| 07A0 L,
( 0000,0685 )                  J, Z| N| L0000,068F RB,
( 0000,0687 )                  CMPI, B| ZO| [DX] 0E8 IB,
( 0000,068A )                  J, Z| N| L0000,068F RB,
( 0000,068C )                  INC, B| ZO| [DX]
( 0000,068E )                  RET,
---------------------------

Can someone shed some light on this?
It is difficult to get a "second opinion" from colorforth
itself, as the z and j are not used in kernel words.

I use strings as compact Huffman tables in the following code,
(ANSI apart from the strings).

[To turn the Stallman convention into a stack diagram,
  replace return with --
  surround with ( )
  keep only upper case words ]
___________________________________
\ Throughout the following: colorname means a machine word with a
\ huffman encoded name a la colorforth.

DECIMAL
\ Offsets in these strings are the Huffman code for 4,5 and 7 length
\ chars, exclusive the Huffman determination prefix.
" rtoeani" DROP CONSTANT C4
"smcylgfw" DROP CONSTANT C5
"dvpbhxuqkzj34567891-0.2/;:!+@*,?" DROP CONSTANT C7

\ For a COLORNAME, return a COLORNAME without its first char, and that CHAR.
: UNPACK
    DUP 0< 0= IF
        DUP   4 LSHIFT SWAP
        28 RSHIFT  C4 + C@
    ELSE 1 LSHIFT DUP 0< 0= IF
        DUP   4 LSHIFT SWAP
        28 RSHIFT  C5 + C@
    ELSE 1 LSHIFT
        DUP   5 LSHIFT SWAP
        27 RSHIFT  C7 + C@
    THEN THEN ;

\ Copy COLORNAME to ``PAD'', return as (ascii) STRINGCONSTANT
\ To not upset the disassembler with conflicting names colorforthnames
\ are prepended with "X_".
: CN>ASC  "X_" PAD $!   BEGIN UNPACK PAD $C+ DUP 0= UNTIL DROP   PAD $@ ;

\ Print out a COLORNAME.
: .WORD   CN>ASC TYPE ;
___________________________________

--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
Economic growth -- like all pyramid schemes -- ultimately falters.
albert@xxxxxxxxxxxxxxxxxx http://home.hccnet.nl/a.w.m.van.der.horst

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com