Things to beware in the optimizer:
	.globl's inside functions will break the optimizer, but they
	are only needed if the label is used outside of the file.
	
	Anything that uses a case instruction needs to not be of the
	sequence casel/label/.word's because then the operands of the
	.words need to be Lnnnn-label, and the ccom uses Lnnnn labels.
	You can prevent this by inserting a .data-.text somewhere in
	there.  Unfortunately, this causes the optimizer to call abort(),
	thus preventing you from compiling your code.
	
	(Even funnier was that this tickled a kernel bug because the
	debugger would then get loaded and try to read the arguments
	that were passed to abort, c2vax did an abort(-1) when the
	debugger expected a string, and the kernel would try to access
	address -1.

To get [device]console to use the Qvss if we have one, we need to
	figure out if we have one relatively early.  We then need to
	change ConsoleCreate to use QvssKeyboardRead/QvssKeyboardRelease or
	ConsoleRead/KeyboardRelease accordingly.  We also need to make
	it use ConsoleWrite/ScreenRelease if we don't have a Qvss, or
	use our own QvssScreenWrite/QvssScreenRelease (which would use
	K_putchar to output the characters).
Actually, since the Read routines are identical except for which buffer
	they get their data out of, we use the ConsoleRead routine with
	a simple if for which buffer.  For writing, we use ConsoleWrite
	if we have no Qvss, otherwise we use one that is not interrupt
	driven and uses K_putchar.

Current changes in progress include:
    1) Getting rid of the svpctx/push/push/rei monstrosity.  
    2) Getting the interrupt stack and the kernel stack to use the same
	data area.
    3) Doing a svpctx/ldpctx only when we are actually doing a context
	switch.  Othertimes we just use an rei to leave the kernel.  In
	all cases we save all the registers separately.
    4) Replacing svpctx/ldpctx with our own code.  Svpctx is faster with
	our own code.  Ldpctx is not.

=== 3 above seems impossibly hard to implement.

A svpctx does the following that we find important:
	save all SP's
	save r0 to r14
	pop pc and psl from stack
	put us on ISP
A ldpctx does the following that we find important:
	load all the SP's
	load r0 to r14
	load the new p0br, p0lr
	put us on KSP
	push psl and pc on stack

Timing tests show that pushr'ing and then popr'ing 6 registers is
    slower than pushing and popping them individually by about 10%.
    Movq'ing onto and off of the stack is faster than pushl'ing on
    and movl'ing off by a fair amount.  "addl2 $4, r11" is slightly
    faster than "moval 4(r11), r11"

To save things, we have to push r0-r5 (C destroys these).  The PC and
    PSL are already saved on the stack.  The AP and FP are saved by a
    calls instruction, and restored by the ret instruction, so we don't
    need to explicitly save them.  Since some of the times when we save
    things we use register variables, we use a mask of 0x0e3f.
At the end of the Switch code, we need to find out if a context switch
    has happenned.  We do that by comparing the current value of the PCBB
    register with the new &Active->proc_state.
Register variables in DummySwitch have to be saved, since they get
    clobbered.
We still have to clean up the exception handler and the KernelCanRead
    functions.  The exception handler make explicit references to the
    proc_state of the faulting process, which hasn't been filled in
    yet.  Anything that makes explicit reference to the proc_state
    of the oldActive process needs to be fixed.  They are machine.c,
    memory.c, exception.c, debug.c, force.c.


-------------------------------------------------------
make the PCBB always point to the proper place in Active.
	We do this by setting it every time a Switch occurs.

make the interrupt/exception vectors always run on the kernel stack.
	can't do this for CHMU and CHMS. But they will generate addressing
	exceptions because we initialize those stack pointers with
	bad addresses.

We need to be loaded starting at location zero.  We also need to start
	executing at the label start.  We need the file startup.c to
	be loaded first, so that the system control block is page aligned.

Vload is going to have to page align code and data, IE, separate the segments.

Worry about setting up and initializing the kernel stack and the small
	interrupt stack.

The svpctx throws us onto the interrupt stack if we were not
	already there.  So we set up a small interrupt stack to
	push two longwords onto and then rei.  But we need to make
	sure that we push a PSL with the proper IPL value.

There are no perprocess system stacks in V, so we do not use segment P1.

Zero is the highest priority process.

We handle fatal exceptions pretty badly.  Whenever there is a doubt we Kabort.

Since system virtual address space and physical addresses map onto
	each other, the startup routine that runs in physical
	memory can be linked to start at 0x80000000 and all will be well.

V normally runs in physical memory, but a Vaxen are different.
	Memory:  since page tables aren't in hardware, we have to
	create them ourselves.  Since we need to be able to have
	every team have the potential to access all of physical memory
	we need a way to keep the sum of the size of the page tables
	small.  We do this by allocating a large system page table that
	maps physical memory one to one, then IO space, then a page table
	for each team of maximum size.  Initially the teams' page tables
	have no physical pages allocated to them.  We allocate them pages
	as they are needed.  This is also the way UNIX does it and now
	seems extremely obvious, but wasn't originally.

Whenever we change the memory mapping we need to write into tbis or tbia
	so that the VAX knows we did it.

If my VAX stackframe intuition is wrong, StackDump will foul up.

The addressing macros will be constant folded only if they are signed
	values.  Therefore, you must only call them with values less
	than 0x80000000.

There is a bug in lint that makes it not constant fold properly, or at
	least not in the manner that cc constant folds.  Thus lint still
	gives constant expected errors even though cc will not.

There is a bug in lint that causes it do not recognize extern declarations
	that are on the same line as asm's.

Nonexistent memory generates machine checks.

The DEQNA can interrupt anywhere we want it to, so we make it interrupt at
	VecDeqna.

In Vikc.c #ifdef LITTLE_ENDIAN the first two shorts.

Whenever we change a team's size, we need to propagate that change to
	each of the processes in the team.  We do this by doing a depth
	first search of the process tree hierarchy and finding the
	processes that belong to the current team.
