This is the third of the references provided by Derek Nickel, another article published in the Dr. Dobbs Sourcebook:
Michael Abrash, "Quake's Hidden-Surface Removal" In: Dr. Dobb's Sourcebook, May/June 1996, #257 Ramblings In Real Time, pp. 48-52In this article Michael Abrash discusses the Hidden Surface Removal (HSR) that still has to follow the Visible Surface Determination he (VSD) described in an earlier arcticle.
He starts with some remarks on the problems of handling moving objects (billboards/sprites, BSP models like doors, items, polygon models) with a BSP, and the solution revealed in his CGDC talk: using a z-buffer. The BSP provides all polygons of the static world in a perfectly sorted order. This could be used to z-fill the z-buffer (saving a separate z-buffer clear) without read/compare. He states that the performance impact for this pass is about 10 percent. Note that this pass has to be done separately from actually texture mapping the polygons in question (which involves a z-calculation per pixel, too, for perspective correct texture mapping), due to the lack of registers with i586 processors.
If I understood correctly, the z-buffer is actually used for billboards (sprites) and polygon models (whose triangles potentially could intersect) and special effects introduced later (once the z-buffer was already in) like particles. BSP models are handled diffently (see below). The z-read/compare/write performance impact is reported to vary a lot, averaging to about 20 percent.
Finally, the memory footprint is about 128K for 320x200. According to this, the same 16bit z-buffer depth uses 1.3MB with the 960x720 modes I am occasionally running with Linux/XFree86. This sounds like much, but according to Michael Abrash, the game requires 8MB anyway, and the surface cache is likely to use a lot more if only available.
The back-to-front approach to render the static world which John Carmack tried with the PVS did not provide level performance because of varying overdraw. I guess back-to-front rendering with painters algorithm is not a useful approach for scalable, densely occluded environments anyway. Michael Abrash states that they considered a beam-tree like approach (I wonder whether Weiler-Atherton clipping would be a good idea), and discarded it, because they already tried it for Visible Surface Determination, and it suffered from overhead and varying performance.
In consequence, they decided to solve the HSR task on span-level instead of polygon level. He sketches the bsic idea of sorted spans, and discusses two alternatives: edge sorting or span sorting. I do not know much about the latter, and will not attempt to summarize his explanation of the former - a good description of Active Edges based rendering can be found in Foley, van Dam et.al, in the VSD/Scan-Line section. Btw., Chris Laurels "wt" and the scanline-based renderer we wrote used a modified edge-sorted rasterizer - with DOOM-style restrictions it is a good idea to rotate by 90 degrees, to draw wall spans in the most efficient way.
Following the overview of both approaches, he summarizes that there does not seem to be a conclusive answer which technique to prefer. For Quake, they decided to use an edge-sorted rasterizer with some advantages:
The remainder of the article covers some details. Abrash explains that using 1/z instead of z as a sorting key has various advantages:
He finishes by explaining how to obtain an 1/z value and an arbitrary point on the polygon in an efficient way, using a plane equation in terms of viewspace z and screen space x,y, and gives the approximate costs on a Pentium as about six cycles.
As stated above, the BSP models are not rendered using a z-buffer read/compare. BSP objects therefore provide overdraw reduction - doors, most notably, block any surface behind them in the edge-sorted rasterizer. In consequence, BSP models have to be clipped against world polygons, to avoid interpenetrations (I wonder if a sufficiently large bounding box for wall-object collision detection is not sufficient). This is not necessary for polygon models and billboards and particles, as z-compares account for intersections. This is obvious from passable lava/water surfaces.
The README says: a Z-sorted spans demo program, consisting of a Visual C++ Makefile, resource and include file and, more important, the demo itself: zsort.h/zsort.c. The download archive of the source that will be discussed in the next issue has been available for quite some time already, and there is a local copy with permission. Remarks by Michael Abrash:
Note: In this implementation, polygon faces must not be interpenetrating. Also, correct sorting is not guaranteed if two polygonal objects butt up against each other. In other words, each polygonal object must be made of a continuous, non-self-intersecting skin, and polygonal objects must not interpenetrate or touch in order for proper sorting to result. More complex, slower sorting is required to make those cases work reliably. Note: polygon processing could be considerably more efficient if polygons shared common edges and edges shared common vertices. Also, indirection to vertices could be used to avoid having to copy all the vertices during every clip test. Outcode-type testing could be used to determine completely clipped or unclipped polygons ahead of time, avoiding the need to clip and copy entirely for such polygons. Outcode-type tests work best in viewspace, with the frustum normalized so that the field of view is 90 degrees, so simple compares, rather than dot products, can be used to categorize points with respect to the frustum.