hardware solution for direct access to video ram

Sun Aug 6 16:22:19 AEST 1989

  A week ago or so I posted an article describing how I gained access to
the video ram on my 3B1.  To tell the truth, I've been a little under-whelmed
with the response I've received.  I did receive a few letters, one form John Bly
Milton IV asking some questions about why I went to such extreme measures.
I hope John doesn't mind if I answer his questions/comments publicly so that
others will understand just why it is worth the trouble.

JOHN:  Seems a bit brute force.

  It is, but because of the design of the 3B1 there is little alternative.
If you want to access an area of protected memory you have basicly three
choices, device driver (see below), use the virtual memory system to map that
memory into your process's address space, or allow direct access.  Personally
I would have preferred the second option, but the page table rams do not allow
access to memory greater then 4Meg, and therefore this wouldn't work. It's too
bad too, because there are three unused bits in the page table rams that could
have been used, which would have allowed this :-(.

JOHN: Is mgr really that good?

  I'll admit that I don't get my jollies from window managers, but it is
public domain and for those who have used sunview on a Sun 3/50 under 4.0,
painful isn't it ;-), Mgr is easily an order of magnitude faster.  It is
fairly small, ~200k, so it shouldn't eat up gobs of memory.  It sure is nice
when your window manager doesn't have to be paged in or out.

JOHN: Why didn't you try a software (loadable device driver) approach?

  A very good question and one that bears answering.  Lets take a look at what
happens when you do a system call, such as a write.  Assuming you have already
opened the device, the sequence of events are:

	1.  The user process puts data into a buffer, it doesn't matter what
	    kind of buffer, variable, array, malloced memory, etc.
	2.  The user process calls write() with the proper parameters, this
	    causes several bytes to be pushed onto the process's stack.
	3.  That write() routine is actually a stub routine, probably written
	    in assembly, that further manipulates several bytes on the stack.
	    The stub then executes a trap instruction that forces the processor
	    into the supervisor mode and transfers execution to the kernel.
	    To do this several more bytes are written onto the stack, take a
	    look at the Motorola documents on the 680x0 family for details.
	4.  The kernel figures out which system call was desired from a lookup
	    table and then jumps to that routine.
	5.  The device driver retrieves the address of the buffer and transfers
	    the data, in this case to video memory.  If this had been a block
	    device instead of a character device, data would have been
	    transferred to a buffer after it was allocated, but for video ram
	    we would have a character device and thus no extra buffer.
	6.  Step 4, 3, and 2 are reversed, undoing all of those stack
	    manipulations.  Also, when the system call returns the kernel takes
	    the opportunity to check if another process should run, so you may
	    loose the processor until the next context switch.

  Now lets take a look at my solution, assuming that you have already set the
video pointer like so:

	unsigned short data, *video = (unsigned short *)0x420000;

  And let's assume you want to write to the 23rd u_short, you would do:

	video[22] = data;

  I'm sorry folks, but this seems like a heck of a lot easier.

  There are some additional benefits to this approach, such as:

	1.  You don't have to spend who knows how may hours writing
	    and testing a device driver.  I wrote a device driver for
	    a Ramtek graphics device on a BSD 4.3 VAX when I was in
	    college and I know how hard it can be to find subtle bugs.
	    But I must admit, a device driver for access to the video
	    ram is fairly trivial, just look at the vidram device driver
	    that has been posted on the net by Mike "Ditto" Ford.
	2.  The special window functions are now in user level code which
	    is far easier to debug.  When Brad and I were working on the
	    portable bit blit code it was made a lot easier than if we had
	    to keep reloading a device driver.  And who knows how many times
	    we would have crashed our machines getting it right.
	3.  Because you now have one screen worth of ram available where it
	    belongs, you can allocate one less buffer. Plus you don't have to
	    make expensive system calls to update the buffer.  For those people
	    who are sleeping, this works too:

	    data = video[22];

	4.  Many window managers expect to see the video memory mapped into
	    user space, Mgr does, and I suspect that X does also, even though
	    I haven't seen any code.  Having this access makes porting a whole
	    lot easier.  In fact, Brad and I weren't going to do the port until
	    we came up with an easy way to get to the video ram.
	5.  It is fast, as fast as the 3B1 with a 10 MHz clock will ever get.
	    My method requires two operations, an offset added to the base and
	    one word of data transferred to that address, i.e, a few machine
	    instructions with < 10 memory references.  The device driver method
	    requires what, 10 - 30 instructions, 10 - 30 instruction fetches,
	    and all those stack writes and reads.  Plus the device driver has to
	    have a way to calculate the offset, possibly requiring an address to
	    be sent in the data stream.
	6.  This is a security hole.  If the page table could have been modified
	    then the MMU pal would take care of this for us.  But since it can't
	    we have a hardware mod.  But this really isn't that big of a deal
	    on a small system.  It isn't like there are a hundred users and you
	    have to protect the screen from peepers.  Security is one of the
	    resons I went to the trouble of using all those address lines in
	    my pal.
	7.  Window manager code doesn't belong in the kernel anyway.  When we
	    get Mgr working all the way we're going to remove the wind.o 
	    driver, which will give us better than 40k of precious kernel space
	    back.
	8.  I don't know if I should mention this, but I don't see any reason
	    to hide it.  The displayable portion of the video does not use up
	    all of the video ram.  So we also have an automatic shared memory
	    segment at the end of video ram.  BUT, it is wide open and you're
	    probably a fool to use it and an idiot to rely on it.

  I hope this answers some questions and piques some curiosity about what we,
the 3B1/7300 user community, can do with our machines.  Personally, I think the
ability to get away from ua and use a "real" window system is worth the
afternoon it takes to make a daughter board.  We also get to have source for a
major part of the system, that alone is enough for me to want to change to Mgr,
X, or what ever.
  Again, I welcome comments, good and bad.  And if you too need a pal and don't
have access to a programmer, let me know and we'll see what happens.

-- 
     ...     ___
   _][_n_n___i_i ________		Brian D. Botton
  (____________I I______I		laidbak!botton
  /ooOOOO OOOOoo  oo oooo