Cg Programming/Programmable Graphics Pipeline

The programmable graphics pipeline presented here is very similar to the OpenGL (ES) 2.0 pipeline, the WebGL pipeline, and the Direct3D 8.0 pipeline. As such it is the lowest common denominator of programmable graphics pipelines of the majority of today's desktop PCs and mobile devices.

Parallelism in Graphics Pipelines edit

GPUs are highly parallel processors. This is the main reason for their performance. In fact, they implement two kinds of parallelism: vertical and horizontal parallelism:

Ford assembly line, 1913.

Vertical parallelism describes parallel processing at different stages of a pipeline. This concept was also crucial in the development of the assembly line at Ford Motor Company: many workers can work in parallel on rather simple tasks. This made mass production (and therefore mass consumption) possible. In the context of processing units in GPUs, the simple tasks correspond to less complex processing units, which save costs and power consumption.

Assembly plant of the Bell Aircraft Corporation with multiple parallel assembly lines, ca. 1944.

Horizontal parallelism describes the possibility to process work in multiple pipelines. This allows for even more parallelism than the vertical parallelism in a single pipeline. Again, the concept was also employed at Ford Motor Company and in many other industries. In the context of GPUs, horizontal parallelism of the graphics pipeline was an important feature to achieve the performance of modern GPUs.

The following diagram shows an illustration of vertical parallelism (processing in stages represented by boxes) and horizontal parallelism (multiple processing units for each stage represented by multiple arrows between boxes).

Vertex Data				e.g. triangle meshes provided by 3D modeling tools
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓				many vertices are processed in parallel
Vertex Shader				a small program in Cg (or another shading language) is applied to each vertex
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
Primitive Assembly				setup of primitives, e.g. triangles, lines, and points
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓				many primitives are processed in parallel
Rasterization				interpolation of data for all pixels covered by the primitive (e.g. triangle)
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓				many fragments (corresponding to pixels) are processed in parallel
Fragment Shader				a small program in Cg (or another shading language) is applied to each fragment (i.e. covered pixel)
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
Per-Fragment Operations				configurable operations on each fragment (i.e. covered pixel)
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓				results of many fragments are written in parallel to the framebuffer
Framebuffer				array of pixels in which the computed fragment colors are stored

In the following diagrams, there is only one arrow between any two stages. However, it should be understood that GPUs usually implement the graphics pipeline with massive horizontal parallelism. Only software implementations of the graphics pipeline, e.g. Mesa 3D (see the Wikipedia entry), usually implement a single pipeline.

Programmable and Fixed-Function Stages edit

The pipelines of OpenGL ES 1.x, core OpenGL 1.x, and Direct3D 7.x are configurable fixed-function pipelines, i.e. there is no possibility to include programs in these pipelines. In OpenGL (ES) 2.0, WebGL, and Direct3D 8.0, two stages (the vertex shader and the fragment shader stage) of the pipeline are programmable, i.e. small programs (shaders) written in Cg (or another shading language) are applied in these stages. In the following diagram, programmable stages are represented by green boxes, fixed-function stages are represented by gray boxes, and data is represented by blue boxes.

Vertex Data		e.g. triangle meshes provided by 3D modeling tools
	↓
Vertex Shader		a small program in Cg is applied to each vertex
	↓
Primitive Assembly		setup of primitives, e.g. triangles, lines, and points
	↓
Rasterization		interpolation of data (e.g. color) for all pixels covered by the primitive
	↓
Fragment Shader		a small program in Cg is applied to each fragment (i.e. covered pixel)
	↓
Per-Fragment Operations		configurable operations on each fragment (i.e. covered pixel)
	↓
Framebuffer		array of pixels in which the computed fragment colors are stored

The vertex shader and fragment shader stages are discussed in more detail in the platform-specific tutorials. The rasterization stage is discussed in Section “Rasterization” and the per-fragment operations in Section “Per-Fragment Operations”.

The primitive assembly stage mainly consists of clipping primitives to the view frustum (the part of space that is visible on the screen) and optional culling of front-facing and/or back-facing primitives. These possibilities are discussed in more detail in the platform-specific tutorials.

Data Flow edit

In order to program Cg vertex and fragment shaders, it is important to understand the input and ouput of each shader. To this end, it is also useful to understand how data is communicated between all stages of the pipeline. This is illustrated in the next diagram:

Vertex Data
	↓	vertex input parameters with semantics (e.g. `POSITION`, `COLOR`, `NORMAL`, `TEXCOORD0`, `TEXCOORD1`, etc.)
Vertex Shader			←	uniform parameters; in some cases texture data (images)
	↓	vertex output parameters with semantics (in particular `POSITION`, `SV_POSITION`, and `PSIZE` but also `COLOR`, `TEXCOORD0`, `TEXCOORD1`, etc.)
Primitive Assembly
	↓	vertex output parameters
Rasterization
	↓	fragment input parameters (interpolated at pixels) with semantics (corresponding to semantics of the vertex output parameters)
Fragment Shader			←	uniform parameters (constant for each primitive) and texture data
	↓	fragment output parameters with semantics (in particular `COLOR` and `DEPTH`)
Per-Fragment Operations
	↓	fragment color and fragment depth
Framebuffer

Vertex input parameters are defined based on the vertex data. For each vertex input parameter a semantic has to be defined, which specifies how the parameter relates to data in the fixed-function pipeline. Examples of semantics are POSITION, COLOR, NORMAL, TEXCOORD0, TEXCOORD1, etc. This makes it possible to use Cg programs even with APIs that were originally designed for a fixed-function pipeline. For example, the vertex input parameter for vertex positions should use the POSITION semantic such that all APIs can provide the appropriate data for this input parameter. Note that the vertex position is in object coordinates, i.e. this is the position as specified in a 3D modeling tool.

Uniform parameters (or uniforms) have the same value for all vertex shaders and all fragment shaders that are executed when rendering a specific primitive (e.g. a triangle). However, they can be changed for other primitives. Usually, they have the same value for a large set of primitives that make up a mesh. Typically, vertex transformations, specifications of light sources and materials, etc. are specified as uniforms.

Vertex output parameters are computed by the vertex shader, i.e. there is one set of values of these parameters for each vertex. A semantic has to be specified for each parameter, e.g. POSITION, SV_POSITION, COLOR, TEXCOORD0, TEXCOORD1, etc. Usually, there has to be an output parameter with the semantic POSITION or SV_POSITION, which determines where a primitive is rendered on the screen (“SV” stands for “system value” and can have a special meaning). The size of point primitives can be specified by an output parameter with the semantic PSIZE. Other parameters are interpolated (see Section “Rasterization”) for each pixel covered by a primitive.

Fragment input parameters are interpolated from the vertex output parameters for each pixel covered by a primitive. Similar to vertex output parameters, a semantic has to be specified for each fragment input parameter. These semantics are used to match vertex output parameters with fragment input parameters. Therefore, the names of corresponding parameters in the vertex shader and fragment shader can be different as long as the semantics are the same.

Fragment output parameters are computed by fragment shaders. A semantic has to be specified, which determines how the value is used in the following fixed-function pipeline. Most fragment shaders specify an output parameter with the semantic COLOR. The fragment depth is computed implicitly even if no output parameter with the semantic DEPTH is specified.

Texture data include a uniform sampler, which specifies the texture sampling unit, which in turn specifies the texture image from which colors are fetched.

Other data is described in the tutorials for specific platforms.

Cg Programming/Programmable Graphics Pipeline

Contents

Parallelism in Graphics Pipelines edit

Programmable and Fixed-Function Stages edit

Data Flow edit

Further reading edit