1 of 44

Venix/86 Emulation

A step in the Venix Source Restoration Project

2 of 44

What was Venix?

  • Early 7th Edition Unix Port
  • VenturCom ported V7 Unix to a number of platforms
    • PDP-11 (devices and CPUs not in 7th edition)
    • Various 8088/8086: most of the early near-clones supported
  • First commercially released 8088/8086 port
    • Bell Labs had one earlier, but it used custom MMU
  • Shipping when PC/IX was announced (IBM’s first supported Unix on the IBM PC/XT/AT)
  • Venix/86 begat Venix/286 begat Venix/386
    • The latter one being System V port with full MMU support

3 of 44

Wait, Unix on MMU-less 8088/8086

  • Yes. Used segment registers creatively
  • Limited to “small” or “tiny” memory models
    • CS == DS == ES == SS (tiny)
    • CS, DS == ES == SS (small) (mirroring split I/D from PDP-11)
  • Very tight integration with compiler / libc / kernel
    • setjmp/longjmp
    • Calling conventions
  • Program in charge of which segments accessed
    • Useful for ClikClok support in userland, for example, to access memory or I/O ports
  • However, no protection against naughty binaries
    • You could scribble over someone else’s processes

4 of 44

So insecure, unprotected and crazy

  • Wouldn’t put this on the internet today…
  • Wouldn’t let users compile code…
  • Wouldn’t allow users even
  • But it’s a decent Unix experience for this old hardware…

5 of 44

But…. The best thing going at the time

  • Lots of businesses ran it…
  • … many usenet nodes
  • … industrial control

6 of 44

Why?

  • Wanted to run Venix/86R on my Rainbow back in the day
  • Poor college student couldn’t afford $500 for a copy
  • Kept looking for it over the years
  • Copy surfaced in 2017…

7 of 44

8 of 44

Why?

  • Wanted to run Venix/86R on my Rainbow back in the day
  • Poor college student couldn’t afford $500 for a copy
  • Kept looking for it over the years
  • Copy surfaced in 2016… Bill Degnan found a copy
  • He sent it to me in 2017 and I imaged it
  • Booted on my Rainbow

9 of 44

10 of 44

Why?

  • Wanted to run Venix/86R on my Rainbow back in the day
  • Poor college student couldn’t afford $500 for a copy
  • Kept looking for it over the years
  • Copy surfaced in 2016… Bill Degnan found a copy
  • He sent it to me in 2017 and I imaged it
  • Booted on my Rainbow
  • But why an emulator?

11 of 44

Venix Source Restoration Project

  • I wanted to rebuild from source
  • I wanted to tweak the kernel
  • I wanted to port other versions of Unix to the Rainbow
  • I’m crazy, and this seems like fun…

12 of 44

13 of 44

Venix Source Restoration Project

  • I wanted to rebuild from source
  • I wanted to tweak the kernel
  • I wanted to port other versions of Unix to the Rainbow
  • I’m crazy, and this seems like fun…
  • … running on real hardware is slow
  • 4.7MHz CPU
  • Slow Disks
  • Slow Memory
  • Takes about 8 hours to build V7
  • Modern PC can do it in a couple of minutes

14 of 44

Venix/86 Under the Hood

  • Boot blocks loaded a program that would read /venix and jump to that
  • /venix would initialize the machine (interrupt vectors, timers, etc) and then start /sbin/init
  • System calls would be handled with INT F1
  • Fixed number of processes
  • Uses Unix 7th Edition versions of most programs

15 of 44

a.out, old school Unix

  • All really old versions of unix, including 7th Edition, used a.out files
  • Relatively simple layout
    • Text
    • Data
    • Zeroed Data (aka bss)
  • Produced by a simple pipeline of tools
  • Mirrored the final, in memory layout

16 of 44

Creating an a.out file

  • C source compiled to assembler with cc(1)
  • Assembler converted to .o object files as(1)
  • Archive .o files into a .a file with ar(1)
  • Linker combines .o and .a files into an a.out file with ld(1)
  • Libc.a is the standard C library in Unix

17 of 44

Idealized a.out memory layout

18 of 44

First Steps

  • Learn the layout of the binaries (so you can write exec)
  • Learn how system calls are made (so you can run user programs)
  • Learn any other relevant environmental details (eg 8087 for floating point)
  • Extract images from installation media…

  • Oh, and once you have the bytes in memory, how do you interpret them

19 of 44

Installation Media

  • Worst case: run kermit to transfer it all over a serial line
    • Sadly there’s bugs in the serial port driver that take too long to service interrupts…
  • Look at distribution media
    • 1 Boot disk, 10 disks with files on them
    • Once you get the interleave right (that was a fun time), you discover it’s just a tarball
  • Extract files from images you grabbed with your floppy imaging tools
    • I used Kyroflux, but you’ll notice I didn’t say ‘favorite’ imaging tool
      • Hey, it works and I don’t need support
      • I’d likely go with the greaseweasel floppy reader today, despite the off color connotations
  • Figure out disk partitioning, newfs, etc
    • Thankfully, not needed for this phase of the project

20 of 44

Figure out memory layout

  • A.out files have headers with flags, sizes, etc
  • Learn how to read them into memory
  • Learn how to setup the stack (this turns out to be tricky)
  • Learn about the interface between kernel and CSU (where do all the args come from, env, etc).
  • Where to start the program?

21 of 44

a.out format

22 of 44

Layout discovery program

#include <stdio.h>

int data=12;

int bss;

int main(argc, argv)

int argc;

char *argv[];

{

int i;

printf("stack 0x%x\n", (unsigned)&i);

printf("data 0x%x\n", (unsigned)&data);

printf("bss 0x%x\n", (unsigned)&bss);

printf("main 0x%x\n", (unsigned)&main);

}

23 of 44

Turns out… there’s 4 types

  • OMAGIC with stack (-z arg to cc)
  • OMAGIC without -z
  • NMAGIC with -z
  • NMAGIC without -z

Note: cc -z XXX sets the size of the stack to use in the program and moves where the stack starts.

24 of 44

Tiny binary a.out layout

25 of 44

Small a.out binary layout

26 of 44

Emulators

  • Pcemu
    • Too large and hard to integrate to
  • A C++ one I found associated with gcc-ia16
    • No FP/8087 and weird instruction bugs
  • Qemu
    • Hard to copy linux-user setup
  • X11 / Video BIOS emulation
    • No FP and some weirdness…
  • FreeBSD/i386 vm86
    • This one worked
    • But only in a bhyve guest…
    • More details later

27 of 44

Reverse Engineering System Calls

  • Used ar(1) to extract read.o, write.o, etc from /lib/libc.a
  • Disassembled the .o (after hacking together a.out support)
  • Created a chart
  • Got it mostly working
  • Then found docs online (more on this later)

28 of 44

Venix/86 System call : example of read(2)

.comm _errno,2

.globl _errno

.globl _read

_read:

push bp

mov bp,sp

mov bx,#3

mov ax,*4(bp)

mov dx,*6(bp)

mov cx,*8(bp)

int 0xf1

jcxz L001

mov _errno,cx

L001:

pop bp

ret

System Call number in bx

Arg1 ax

Arg2 dx

Arg3 cx

Jump to kernel: INT F1

CX != 0 -> error, returned in cx

29 of 44

30 of 44

Fork / Exec

  • Fork is emulated with fork.
    • V7 semantics are old-school copy the whole address space
    • No need to make it more complicated
    • Open files fork correctly
    • Lots of tricky semantics using fork just makes work
    • Small downside of PID consistency between processes
  • Exec is not emulated with exec
    • Too much context is lost
    • Turns out it’s identical to the ‘load’ discussion above
    • A little refactoring and the same code does both

31 of 44

The Manual

  • This just in….

32 of 44

33 of 44

The Manual

  • Not even PDP-11, but the PRO version of Venix
  • The DEC Professional was DEC’s “Mirco” version of the PDP-11 that flopped
  • DEC released the PRO versions of Venix on a DECUS set of floppy disks
  • Bitsavers had scanned copies of the manual
  • So this is an old, orphaned unix for an old, orphaned processor version
  • I knew the manual existed, but didn’t know it had x86 info

34 of 44

intro(2) man page

35 of 44

read(2) man page

36 of 44

Venix/86 System call : example of read(2)

.comm _errno,2

.globl _errno

.globl _read

_read:

push bp

mov bp,sp

mov bx,#3

mov ax,*4(bp)

mov dx,*6(bp)

mov cx,*8(bp)

int 0xf1

jcxz L001

mov _errno,cx

L001:

pop bp

ret

System Call number in bx

Arg1 ax

Arg2 dx

Arg3 cx

Jump to kernel: INT F1

CX != 0 -> error, returned in cx

37 of 44

vm86venix read call

38 of 44

Helper routines

39 of 44

40 of 44

41 of 44

42 of 44

Status

  • Compiler works (hello world, much of the system source)
  • Runs great on my bhyve guest on my FreeBSD/amd64 box
  • Many edge cases contain bugs, especially when memory is tight and the program used every last byte it could
  • Maybe ¾ of the V7 userland code compiles, though many w/o optimization
  • Kernel code needs a lot of work
    • System header files are needed (but driver kits for IBM/PC survive with them)
    • Need an understanding of the hacks done to the system call glue to make it match disassembled sources.
    • Missing BSW driver .o’s

43 of 44

Next Steps

  • Add CI to my repo so I can make changes and have it run regression tests
  • Fix bugs in tight memory programs
  • Start work on the Venix Source Restoration project with more gusto and have that drive bug fixing of vm86venix.
  • Consider creating a qemu old-unix-user program (it could run anywhere because the system calls are so simple)
  • Give this talk and see if anybody cares

44 of 44

Questions

Warner Losh

imp@bsdimp.com

https://github.com/bsdimp/venix/tools/vm86venix (sources)

FOSDEM February 2022