[colorforth] Disassembling the BIOS: presenting ciasdis
- Subject: [colorforth] Disassembling the BIOS: presenting ciasdis
- From: albert@xxxxxxxxxxxxxxxxxx (Albert van der Horst)
- Date: Thu, 25 Nov 2004 00:40:09 +0100
- Cc: Albert van der Horst <albert@xxxxxxxxxxxxxxxxxx>
Hey folks,
I have completed the first phase of my plan to crack the BIOS
into compliance with colorforth, and other booting Forth's.
It is a general purpose assembler/disassembler system.
I have tried it out on my own Forth, and managed to recover
an editable and understandable source. I have moved a definition
of a word from the beginning to the end, and reassembled.
For the specific data structures of my Forth, one can write
a specific plug in of a few Forth words, to make a "crawler"
something that follows the data structures of the object being
analysed. I supply that as an attachment.
This is the (slightly hyped) announcement I have made to
comp.linux.announce. (The program runs also on the windows
version of ciforth, but I have not tested that.)
Like many things in Forth it turned out to be relatively easy
to add label handling and two pass assembling of files to
an existing Forth assembler. For the occasion I have embellished
the assembler with all the missing Pentium instructions, notably
floating point.
---------------------------------
There is a war on. It is about whether the knowledge
humanity is accumulating at an unprecedented pace, remains in the
hands of a few, or is available to us all.
An important role in this war is played by reverse engineering tools.
My
computer_intelligence_assembler_disassembler_386
is such a tool.
For convenience it is abbreviated ciasdis or cias/cidis 1) .
Continuous pressure is applied to outlaw such tools, or give the
impression that they are illegal. They are already outlawed to an extent,
even in a traditionally liberal country like the Netherlands. Download
before it is too late.
http://home.hccnet.nl/a.w.m.van.der.horst/forthassembler.html
This is version 0.1.0: an Alpha release. Draw no conclusions from that
about reliability! Alpha only means that the specification can change
depending on user reports. Large parts of this code base have been
stable for years, in particular the PostIt-FixUp Intel assembler.
(Once in Beta upwards compatibility will be maintained.)
Needless to say, it is open source, and protected by the GNU Public
License to stay that way. (``Open Source'' is not really open source.)
This tool is like a sword, seemingly low-tech. It requires skill, but
in close-combat it is as deadly as a machine-gun. All you need is a
single 130 kbyte executable 2). It doesn't require anything particular
to be installed, and runs probably on old kernels (1.2) and BSD's.
Applications of reverse engineering are (not exhaustive):
1. Analyzing viruses
2. Plug vulnerabilities in closed source programs
3 Removing bugs from same
4. Finding copyright infringement and competition-exclusion in same
5. Adapting drivers to run on an Operating System of Your Own Choice
6. Recovering the lost source of a program
7. Analyzing a BIOS to allow Full Use of Your Hardware
8. (Requires above-average skill) Incorporating a DSP assembler, then
analyze codec's.
9. Removing copy-protection or dongle-inspection and changing expiration
dates.
Of those only 9 is presently possibly illegal. If you want to provoke
a trial process, please publish and distribute a .cul file separately
from ciasdis, and don't implicate me. Because of the other facilities
possession of this tool itself is legal (as yet, to my best knowledge,
in most countries).
Distinguishing features of ciasdis are:
1. Analysis is primarily interactive and cumulative, building a database.
2. Scripting is of the essential. Large programs are too
time-consuming to analyze fully by hand. ciasdis allows to automate
extracting names from undisclosed formats. (Traditional tools like
gdb, GNU objdump, extract information from well organized, fully
documented formats.)
3. It handles binaries where different types of information (code, data, tables)
are interspersed.
4. A disassembly can be reassembled to byte-for-byte same code.
Note: my assembler format has been called "it's hell". However,
there is no way point 4 can be attained using the official Intel
assembler language.
The archive contains:
1. the source for cias/cidis
2. assemblers for Pentium, 80386, 8086, DEC Alpha, 6809, 8080 compatible with
cias/cidis
3. an executable for GNU-Linux to analyze Intel x86 16/32 bits code,
4. Man pages for this executable(at 3).
Man page for the script language. format of the scripts.
5. consult scripts for EXE and ELF, the headers of programs in Windows
and GNU-Linux respectively.
6. an example of simple use
7. a large example generated with a dedicated script showing interspersed
code, data and text areas
8. documentation for the principle of operation and the Intel assembler
code.
Ad 1 and 2 : you can use the sources supplied to build e.g. an executable to
run on windows to analyze DEC Alpha programs.
The bulk of the information in the large example was generated by a
plug in script, extracting name information from the binary. This
script is itself a result of the reverse engineering effort, tailored
to the the binary. It serves to document its format too.
Below you see a fragment of an analysis of lina (the underlying Forth
compiler of cias/cidis), automatically generated, showing labels,
pieces of text, a piece of threaded code and a piece of Intel
assembler. (Forth compilers are notoriously difficult to analyze,
traditional code crawling breaks down for threaded code.)
....
( 0804,AF18 ) :N_ALIGN d$ 5 0 0 0 "ALIGN" 90 90 90
( 0804,AF24 ) :X_ALIGN dl docol H_ALIGN H_U0 X_CHARS
( 0804,AF34 ) dl N_ALIGN 0000,0000
( 0804,AF3C ) :H_ALIGN dl X_DP X_@
( 0804,AF44 ) dl X_ALIGNED X_DP X_! semis
( 0804,AF54 ) :N_ALIGNED d$ 7 0 0 0 "ALIGNED" 90
( 0804,AF60 ) :X_ALIGNED dl H_ALIGNED H_ALIGNED 0000,0000 X_ALIGN
( 0804,AF70 ) dl N_ALIGNED 0000,0000
( 0804,AF78 ) :H_ALIGNED POP|X, AX|
( 0804,AF79 ) DEC|X, AX|
( 0804,AF7A ) ORI|A, B'| 0000,0003 IB,
( 0804,AF7C ) INC|X, AX|
( 0804,AF7D ) PUSH|X, AX|
( 0804,AF7E ) LODS, X'|
( 0804,AF7F ) JMPO, ZO| [AX]
( 0804,AF81 )
....
If you are not impressed, this tool is not for you.
1)
DISCLAIMER: for convenience you may use names like cias and cidis to
link to computer_intelligence_assembler_disassembler_386 . Do this at
your own risk. cias and cidis are trademarks owned by their respective
owners, or will be so in the near future (like all 3,4 and 5 letter
words.)
2) Plus Petabytes of information. I suggest the Internet.
--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
One man-hour to invent,
One man-week to implement,
One lawyer-year to patent.
albert@xxxxxxxxxxxxxxxxxx http://home.hccnet.nl/a.w.m.van.der.horst
( $Id: linacrawl.cul,v 1.12 2004/09/15 12:03:49 albert Exp $ )
( Copyright{2000}: Albert van der Horst, HCC FIG Holland by GNU Public License)
( Uses Richard Stallmans convention. Uppercased word are parameters. )
HEX INIT-ALL
INCLUDE elf.cul
0804,A0DC EQU semis
0804,B7AC EQU semiscode
0804,A664 EQU docol
0804,BE3C LABEL donumber
0804,A6EC LABEL doconstant
0804,A72C LABEL dovar
0804,A75C LABEL douse
0804,B814 LABEL dodoes
0804,C828 LABEL DOVOC \ High level code.
0804,9968 LABEL init_user
0804,F82C EQU last_dea \ TASK
0804,95C4 EQU last_dea_envir
\ Align an address in the host space.
: hALIGN 1- 3 OR 1+ ;
CREATE NAME-BUFFER 0 , 256 ALLOT
\ Prepend to NAME a PREFIX, return prefixed NAME in a static buffer.
: PRE-PEND NAME-BUFFER $! NAME-BUFFER $+! NAME-BUFFER $@ ;
\ Transform NAME into: NAMELABEL (prepended "N_").
: >N_ "N_" PRE-PEND ;
\ ADDRESS points to a valid name. Add a label to address the name.
: ADD-NAME-LABEL DUP th $@ >N_ INSERT-EQU ;
\ Generate a equ label at (target) ADDRESS with NAME,
\ unless address is already labeled.
: ?LABELED? 2>R DUP >LABEL 0= IF 2R> INSERT-EQU ELSE 2R> 2DROP DROP THEN ;
\ ADDRESS points to a valid dea. Add a label to address the dea(xt)
\ and the datafield.
\ Assume that just before the dea's name has been analysed.
: ADD-DEA-LABEL &X NAME-BUFFER CELL+ C! DUP NAME-BUFFER $@ ?LABELED?
&H NAME-BUFFER CELL+ C! 4 + L@ NAME-BUFFER $@ ?LABELED? ;
\ ADDRESS points to a valid name.
\ Add an anonymous section to disassemble the name.
: ADD-NAME-SECTION DUP L@ hALIGN OVER + 4 + -d$- ;
\ ADDRESS points to a dea.
\ Add an anonymous section to disassemble the dea.
: ADD-DEA-SECTION DUP 18 + -dl- ;
\ Add the information that ADDRESS is a nfa.
: IS-A-NAME L@ DUP IF DUP ADD-NAME-LABEL ADD-NAME-SECTION _ THEN DROP ;
\ The CONTENT of address indicates : "This IS not yet the end of
\ high level code".
: STILL-CODE? >R R@ semis <> R@ semiscode <> AND RDROP ;
\ Add the information that ADDRESS points to high level code.
: IS-HIGH-LEVEL DUP BEGIN DUP L@ STILL-CODE? WHILE 4 + REPEAT 4 + -dl- ;
\ For CFA : "it POINTS to the high level interpreter"
: IS-DOCOL? L@ docol = ;
\ Accumulate the information that ADDRESS contains a code field address.
: IS-A-CFA DUP L@ CRAWL DUP IS-DOCOL? IF 4 + L@ IS-HIGH-LEVEL _ THEN DROP ;
\ Accumulate the information that DEA is a dea.
: IS-A-DEA DUP 8 + L@ 1 AND IF DROP EXIT THEN \ Dummy field
DUP 10 + IS-A-NAME DUP ADD-DEA-LABEL
DUP IS-A-CFA DUP ADD-DEA-SECTION
DROP ;
\ Accumulate the information from DEA as a wid, follow the link field.
: CRAWL-WID BEGIN DUP IS-A-DEA 0C + L@ DUP 0= UNTIL DROP ;
SORT-ALL
last_dea_envir CRAWL-WID
last_dea CRAWL-WID
SORT-ALL
\ User area contains longs
init_user DUP 100 + -dl-
\ Yet another buffer
0804,DA0C DUP 200 + -dl-
\ The data fields of vocabularies consist of pointers.
H_DENOTATION DUP 10 + -dl-
H_ENVIRONMENT DUP 10 + -dl-
H_FORTH DUP 10 + -dl-
MAKE-CUL
EXIT
CLEANUP-SECTIONS
MAKE-CUL
PLUG-HOLES
MAKE-CUL
DISASSEMBLE-ALL
---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com