Cg Programming/Unity/Computing Color Histograms
This tutorial shows how to compute a color histogram of an image with the help of compute shaders in Unity. In particular, it shows how to use an atomic function such that multiple threads (i.e., multiple calls to a compute shader function) can access the same memory location. It also shows how to use compute buffers. If you are not familiar with compute shaders in Unity, you should read Section “Computing Image Effects” first. Note that compute shaders are not supported on macOS.
Computing Color Histograms in General
editAn RGB color histogram of an image is a bar chart that shows for each value of the red, green, and blue channel, how many pixels of the image feature that value. For example, how many pixels have a red value of 0, how many pixels have a green value of 0, etc. For a color resolution of 8 bits, there are 256 possible values (0 to 255) of the red, green, and blue channels; thus, a RGB color histogram specifies 3 × 256 = 768 numbers. If an alpha channel is also included, the RGBA color histogram consists of 4 × 256 = 1024 numbers.
To compute such an RGBA color histogram, a program would first initialize the 1024 numbers of the histogram to 0. Then it looks at each pixel of the image and increment (by 1) the four numbers in the histogram for the specific red, green, blue, and alpha values of the pixel. Since the same operations are performed for each pixel, this problem is easy to parallelize, except that two different threads for two different pixels might try to increment the same number of the histogram at the same time, which can lead to problems that are called race conditions. These problems can be avoided if the operation to increment one of the numbers of the histogram is an atomic operation, i.e., if it cannot be interrupted by other threads. This is what we use in the compute shader of this tutorial.
The Big Picture: Calling the Compute Shader
editIn this tutorial, we start with the C# script that calls the compute shader because it provides the bigger picture. Note that we compute color histograms for any texture image; not only for camera views as in Section “Computing Image Effects”. Thus, you can attach this script to any GameObject
.
using UnityEngine;
public class histogramScript : MonoBehaviour {
public ComputeShader shader;
public Texture2D inputTexture;
public uint[] histogramData;
ComputeBuffer histogramBuffer;
int handleMain;
int handleInitialize;
void Start ()
{
if (null == shader || null == inputTexture)
{
Debug.Log("Shader or input texture missing.");
return;
}
handleInitialize = shader.FindKernel("HistogramInitialize");
handleMain = shader.FindKernel("HistogramMain");
histogramBuffer = new ComputeBuffer(256, sizeof(uint) * 4);
histogramData = new uint[256 * 4];
if (handleInitialize < 0 || handleMain < 0 ||
null == histogramBuffer || null == histogramData)
{
Debug.Log("Initialization failed.");
return;
}
shader.SetTexture(handleMain, "InputTexture", inputTexture);
shader.SetBuffer(handleMain, "HistogramBuffer", histogramBuffer);
shader.SetBuffer(handleInitialize, "HistogramBuffer", histogramBuffer);
}
void OnDestroy()
{
if (null != histogramBuffer)
{
histogramBuffer.Release();
histogramBuffer = null;
}
}
void Update()
{
if (null == shader || null == inputTexture ||
0 > handleInitialize || 0 > handleMain ||
null == histogramBuffer || null == histogramData)
{
Debug.Log("Cannot compute histogram");
return;
}
shader.Dispatch(handleInitialize, 256 / 64, 1, 1);
// divided by 64 in x because of [numthreads(64,1,1)] in the compute shader code
shader.Dispatch(handleMain, (inputTexture.width + 7) / 8, (inputTexture.height + 7) / 8, 1);
// divided by 8 in x and y because of [numthreads(8,8,1)] in the compute shader code
histogramBuffer.GetData(histogramData);
}
}
The script defines three public variables: public ComputeShader shader
which has to be set to the compute shader that is shown below; public Texture2D inputTexture
which has to be set to the texture for which the histogram should be computed; and public uint[] histogramData
which the script sets to an array of 1024 unsigned ints of the compute histogram.
The three private variables are: ComputeBuffer histogramBuffer
which contains the same data as histogramData
but can be accessed by the compute shader; int handleMain
and int handleInitialize
are the indices of the two compute shader functions for the main processing of all pixels and for the initialization of the 1024 numbers of the histogram.
The Start()
function sets the two handles with ComputeShader.FindKernel()
and creates the histogramBuffer
compute buffer and the histogramData
array. While the compute buffer is created as an array of 256 elements that each contain 4 unsigned ints, the histogramData
is created as an array of 1024 unsigned ints. This difference does not matter since the memory layout is the same for both. Of course, the histogramData
could also be defined as an array of 256 structs that each contain 4 unsigned ints. The rest of the Start()
function does error checking and sets the texture and compute buffer to the corresponding uniform variables for each compute shader function such that they have access to them.
The OnDestroy()
function simply releases the compute buffer since the hardware resources attached to it are not automatically released by the garbage collector.
The Update()
function does some error checking and then calls the compute shader function for the initialization of the histogramBuffer
and the compute shader function for processing all the pixels. For the initialization, we use 4 (= 256 / 64) thread groups of 64 × 1 × 1 threads to initialize the 256 elements of the compute buffer. For the main processing of the pixels we use thread groups of 8 × 8 × 1 threads and compute the number of thread groups by dividing the dimensions of the texture image by 8. The addition of 7 is necessary to make sure that we are not short by one thread group if the dimensions are not divisible by 8. Lastly, the Update()
function calls histogramBuffer.GetData(histogramData);
to copy the data from the compute buffer to the Unity array in histogramData
; note that the two data structures have to have the same memory layout for this call to work.
At the end of each frame, the computed color histogram is available in the public variable histogramData
; thus, you can look it in the Inspector Window while running the program.
The Nitty-Gritty Details of the Compute Shader
editIn this case, the compute shader contains two compute shader functions, one for the initialization and the other one for the main processing of the texels of the texture. Therefore, it also includes two #pragma kernel
instructions and two [numthreads()]
instructions:
#pragma kernel HistogramInitialize
#pragma kernel HistogramMain
Texture2D<float4> InputTexture; // input texture
struct histStruct {
uint4 color;
};
RWStructuredBuffer<histStruct> HistogramBuffer;
[numthreads(64,1,1)]
void HistogramInitialize(uint3 id : SV_DispatchThreadID)
{
HistogramBuffer[id.x].color = uint4(0, 0, 0, 0);
}
[numthreads(8,8,1)]
void HistogramMain (uint3 id : SV_DispatchThreadID)
{
uint4 col = uint4(255.0 * InputTexture[id.xy]);
InterlockedAdd(HistogramBuffer[col.r].color.r, 1);
InterlockedAdd(HistogramBuffer[col.g].color.g, 1);
InterlockedAdd(HistogramBuffer[col.b].color.b, 1);
InterlockedAdd(HistogramBuffer[col.a].color.a, 1);
}
As always, you create a compute shader by clicking on Create in the Project Window and choosing Shader > Compute Shader. You should then copy&paste the code into the new file.
The first two lines #pragma kernel HistogramInitialize
and #pragma kernel HistogramMain
specify the two compute shader functions (“kernels”) that can be called from a script with the ComputeShader.Dispatch()
function.
Texture2D<float4> InputTexture;
specifies a uniform variable for a read-only 2D RGBA texture with name InputTexture
.
struct histStruct { uint4 color; };
defines a small structure with only one member: a 4D unsigned int vector called color
. color.r
is used to count the red pixels with a certain value (according to the position in the array); and analogously color.g
, color.b
, and color.a
for the green, blue, and alpha channel.
The structure histStruct
is then used in RWStructuredBuffer<histStruct> HistogramBuffer;
to define a read/write structured buffer that represents the compute buffer histogramBuffer
in the C# script. The memory layout matches because the elements of the RWStructuredBuffer
is of type histStruct
, which consists of 4 uints.
The function HistogramInitialize()
uses thread groups of dimensions 64 × 1 × 1, which means that the argument uint3 id : SV_DispatchThreadID
runs from uint3(0, 0, 0)
to uint3(255, 0, 0)
since we use 4 thread groups. Therefore, the function can use id.x
to index the 256 elements of the HistogramBuffer
when initializing all elements to 0.
The function HistogramMain()
uses thread groups of dimensions 8 × 8 × 1. Since we base the number of thread groups on the texture size, the function can use the argument uint3 id : SV_DispatchThreadID
to access the texels of the texture with InputTexture[id.xy]
. Since the RGBA values are read as floating-point values between 0.0 and 1.0, they are multiplied with 255.0 and rounded down by converting them to unsigned ints in the uint4 col
variable. The RGBA values in col
are then used to index the HistogramBuffer
to increment the counter variables in the buffer, i.e., HistogramBuffer[col.r].color.r
for the red value, HistogramBuffer[col.g].color.g
for the green value, etc.
To increment the counter variables, the code uses the function InterlockedAdd()
which takes a variable as first argument and an integer as second argument. In our case, the latter is 1 because we increment by 1. InterlockedAdd()
is one of the atomic functions of HLSL compute shaders; i.e., the GPU makes sure that any race conditions due to multiple threads trying to increment the same variable at the same time are avoided. There are a couple of atomic functions in HLSL; note that all of them work only with integers or unsigned integers.
If you want to observe the effect of the race conditions, you can replace the calls to the atomic function InterlockedAdd()
by code like this:
HistogramBuffer[col.r].color.r += 1;
// WARNING: THIS CREATES RACE CONDITIONS!
On most GPUs, this will not be an atomic operation and, therefore, there will usually be race conditions when you run this code, which lead to undefined results. You might be able to observe in the Inspector Window that the values in the histogramData
array change somewhat randomly due to these race conditions.
Summary
editYou have reached the end of this tutorial! A few of the things that you have learned are:
- What color histograms are and how to compute them.
- How to create and use Unity's compute buffers in a C# script and how to define a corresponding read/write structured buffer in a compute shader.
- How to define and use multiple compute shader functions in one compute shader.
- How to use an atomic function in a compute shader.
Further reading
editIf you still want to know more
- about compute shaders in Unity, see Section “Computing Image Effects”.
- about compute buffers in Unity, see the description in Unity's documentation.
- about compute shaders in HLSL (including
RWStructuredBuffer
and atomic functions), see the description in Microsoft's developer network.