VPP  0.7
A high-level modern C++ API for Vulkan
How to write shader and computation code

How to write shader and computation code

In section How do C++ shaders work we have explained how the C++ shaders work internally. Here is a list of typical elements and constructs found in these shaders. Consult individual pages for these items for more information.

Opening directive

The first thing that is usually found in each VPP shader is the using directive, for easy access to VPP types:

void MyPipeline :: fVertexShader ( vpp::VertexShader* pShader )
{
using namespace vpp;
// ...
}

Accessing binding points

Some binding points require accessors - objects declared in the shader code. But not all, as there exist binding points accessed directly.

Accessing vertex and instance data buffers

Vertex and instance data are supplied via vpp::inVertexData binding points. They are read-only and accessed directly: Example:

void fVertexShader ( vpp::VertexShader* pShader )
{
using namespace vpp;
Vec4 vpos = m_inVertexData [ & GVertexPos::m_modelPosition ];
Mat4 modelMatrix = m_inInstanceData [ & GInstancePar::m_model2world ];
}

Accessing uniform buffers

Uniform buffers are bound via vpp::inUniformBuffer or vpp::inUniformBufferDyn binding points. They are untyped, read-only and require an accessor which provides the type. The accessor can be one of the following:

The example below shows vpp::UniformVar. The remaining ones have identical syntax, but require one more level of indirection (the [] indexing operator) to access individual array element.

void fVertexShader ( vpp::VertexShader* pShader )
{
using namespace vpp;
const Mat4 proj = inFramePar [ & GFrameParams::m_projection ];
const Mat4 model = inFramePar [ & GFrameParams::m_model ];
}
// Another version, assuming that m_framePar holds entire array...
void fVertexShader ( vpp::VertexShader* pShader )
{
using namespace vpp;
Int idx = ...; // compute array index somehow
const Mat4 proj = inFramePar [ idx ][ & GFrameParams::m_projection ];
const Mat4 model = inFramePar [ idx ][ & GFrameParams::m_model ];
}

Accessing storage buffers

Storage buffers are similar to uniform buffers, but they are read-write. Their binding point types are vpp::ioBuffer and vpp::ioBufferDyn. Other details are the same. An example (showing simple array variant this time):

vpp::ioBuffer m_target;
void fComputeShader ( vpp::ComputeShader* pShader )
{
using namespace vpp;
outTarget [ 0 ] = outTarget [ 0 ] * 2.0f;
}

Accessing texel buffers

Texel buffers are a hybrid of images and buffers. They are one-dimensional, can hold only arrays of simple data and are accessed via image functions (e.g. ImageStore or TexelFetch). VPP also provides the TexelArray accessor to allow using these buffers like regular buffers, with indexing operator instead of function calls. This is similar to vpp::UniformSimpleArray accessor. Corresponding binding points are vpp::inTextureBuffer and vpp::ioImageBuffer.

typedef vpp::format< float, float, float, float > Fmt4xF32;
void fComputeShader ( vpp::ComputeShader* pShader )
{
using namespace vpp;
const IVec3 workgroupId = pShader->inWorkgroupId;
const IVec3 localId = pShader->inLocalInvocationId;
const Int g = workgroupId [ X ];
const Int l = localId [ X ];
const Int index = ( g << 5 ) | l;
TexelArray< decltype ( m_inTexBuf ) > inTexBuf ( m_inTexBuf );
TexelArray< decltype ( m_ioImgBuf ) > ioImgBuf ( m_ioImgBuf );
const Int bufferLength = inTexBuf.Size();
If ( index < bufferLength );
{
const Vec4 va = inTexBuf [ index ];
ioImgBuf [ index ] = va;
}
Fi();
}

Caution: differently than uniform buffers, texel buffers require vpp::TexelBufferView objects to be explicitly constructed and stored along with corresponding buffers. Forgetting to do so will result in undefined behavior of texel buffers (sometimes work, sometimes not), currently the validation layer does not detect this. In order to make this error harder to occur, constructors of vpp::TexelBufferView are explicit.

Accessing push constants

Push constants are accessed in shaders in the same way as single-structure uniform buffers. They are actually small read-only uniform buffers with implicitly allocated memory.

template< vpp::ETag TAG >
struct TConstantData : public vpp::UniformStruct< TAG, TConstantData >
{
};
typedef TConstantData< vpp::GPU > GConstantData;
typedef TConstantData< vpp::CPU > CConstantData;
class MyPipeline : public vpp::ComputePipelineConfig
{
public:
// ...
inline CConstantData& data() { return m_sourceData.data(); }
inline void setField1 ( int v ) { m_sourceData.data().m_field1 = v; }
// ...
inline void push() { m_sourceData.cmdPush(); }
void fComputeShader ( vpp::ComputeShader* pShader );
public:
// ...
};
void MyPipeline :: fComputeShader ( vpp::ComputeShader* pShader )
{
using namespace vpp;
// Use same syntax as for uniform buffers.
const Int f1 = inSourceData [ & GConstantData::m_field1 ];
const Int f2 = inSourceData [ & GConstantData::m_field2 ];
const Float f3 = inSourceData [ & GConstantData::m_field3 ];
const UInt f4 = inSourceData [ & GConstantData::m_field4 ];
const Mat2 fm = inSourceData [ & GConstantData::m_matrixField ];
}

Push constants are written on the CPU side directly. In the example above, use the data() method to obtain a reference to CConstantData structure, and write the values directly (like in setField1 method). To transfer the structure value, use cmdPush method on the vpp::inPushConstant object.

This should be done from a command sequence (lambda function). The cmdPush makes a local copy of the entire structure and schedules a command for the GPU to update its own copy of the structure using the local one as the source. These values will be visible in shaders launched by subsequent draw or computation commands.

This way you can actually schedule multiple structure updates from single command sequence. Each cmdPush memorizes whatever values were set in the vpp::inPushConstant::data() object at the time of calling and generates a command to be executed later, to set exactly these values. So the following pattern is possible:

m_sourceData.data().m_field1 = ...; value11
m_sourceData.data().m_field2 = ...; value21
// ...
m_sourceData.cmdPush();
cmdDraw ( ... ); // shaders will use value11, value21 ...
m_sourceData.data().m_field1 = ...; value12
m_sourceData.data().m_field2 = ...; value22
// ...
m_sourceData.cmdPush();
cmdDraw ( ... ); // shaders will use value12, value22 ...

Accessing inter-shader variables

Inter-shader variables allow to pass some data from one shader to the next shader in pipeline. This always must be strictly following shader, skipping shaders is not allowed.

The binding point is vpp::ioVariable or vpp::ioStructure. Provide the data type as well as source and target shaders for the template.

These binding points need an accessor, to tell whether we are writing to the variable or reeading from it. These accessors are named vpp::Output and vpp::Input respectively.

Passing the data to the fragment shader (from any other shader type) involves automatic interpolation, with the exception of integer data.

Example:

void fVertexShader ( vpp::VertexShader* pShader )
{
using namespace vpp;
outUV = m_vertices [ & GVertexAttr::m_uv ][ XY ];
}
void fFragmentShader ( vpp::FragmentShader* pShader )
{
using namespace vpp;
const Vec4 color = Texture ( m_colorMap, inUV );
// ...
}

Accessing textures (sampled images)

Textures are read-only images working together with a sampler. Binding points vpp::inConstSampledTexture and vpp::inSampledTexture expose textures already associated with a sampler, so there is no need to worry about it in the shader.

In order to read from a texture, use the vpp::Texture function or any other function from Texture family (there are a lot of them). All these functions require coordinates, and sometimes other arguments. Textures do not need accessors. An example:

void fFragmentShader ( vpp::FragmentShader* pShader )
{
using namespace vpp;
const Vec4 texelColor = Texture ( m_colorMap, inUV );
// ...
}

Accessing storage images

Storage images are images accessed without sampling. Individual pixels are being read or written directly. Binding point type associated with storage images is vpp::ioImage. There are a number of functions which take this binding point as an argument. No accessors are needed. Example:

void fComputeShader ( vpp::ComputeShader* pShader )
{
using namespace vpp;
const IVec3 workgroupId = pShader->inWorkgroupId;
const IVec3 localId = pShader->inLocalInvocationId;
const Int g = workgroupId [ X ];
const Int l = localId [ X ];
const Int index = ( g << 5 ) | l;
const IVec2 imgSize = ImageSize ( m_img1 );
const Int width = imgSize [ X ];
const Int height = imgSize [ Y ];
const Int x = index % width;
const Int y = index / width;
If ( y < height );
{
const IVec2 coords = IVec2 ( x, y );
const Vec4 v = ImageLoad ( m_img1, coords );
ImageStore ( m_img2, coords, v );
}
Fi();
}

The above example copies pixels from one image to another. vpp::ImageSize() can be used to retrieve the size of the image. vpp::ImageLoad() reads a pixel, vpp::ImageStore() writes it. Coordinates are expressed as an integer vector with number of components equal to image dimensionality (plus one, if the image is arrayed).

Accessing input attachments

Input attachments allow to receive data from another vpp::Process node in the rendering graph. An input attachment in target process is simultaneously output attachment in the source process. VPP and Vulkan automatically maintain a dependency between the processes.

Input attachments are accessed (read-only) via vpp::inAttachment binding point.

Each input attachment requires allocation of image and image view. The image should have vpp::Img::INPUT usage bit set, as well as one of output attachment usage bits (vpp::Img::COLOR or vpp::Img::DEPTH). The vpp::inAttachment template should be parameterized with the view type, just like texture or storage image.

Reading of input attachments may occur only in fragment shaders. No other shader type is allowed. Reading is being performed by means of vpp::SubpassLoad() function. Coordinates are relative to current pixel position in particular fragment shader call. Usually call the overload without arguments, as it supplies zero offset and simply reads current pixel.

Reading is fully synchronized by Vulkan. The pixel value you read is guaranteed to be final pixel value generated by preceding process. This is true even if the pixel is being written multiple times or constructed incrementally (via blending or logical operations).

Example:

typedef vpp::format< vpp::unorm8_t, vpp::unorm8_t, vpp::unorm8_t, vpp::unorm8_t > format_type;
format_type, vpp::RENDER, vpp::IMG_TYPE_2D,
VK_IMAGE_TILING_OPTIMAL, VK_SAMPLE_COUNT_1_BIT,
false, false > AttAttrType;
typedef vpp::Image< AttAttrType > AttImgType;
typedef vpp::ImageView< AttVaType > AttViewType;
class MyPipeline : public vpp::WholeScreenPatchPipelineConfig
{
public:
MyPipeline (
const vpp::Process& pr,
const vpp::Device& dev,
vpp::WholeScreenPatchPipelineConfig ( pr, dev ),
m_inColor ( input ),
m_outColor ( output ),
m_fragmentShader ( this, & MyPipeline::fFragmentShader )
{}
void setDataBuffers (
const AttViewType& inputView,
vpp::ShaderDataBlock* pDataBlock )
{
pDataBlock->update (( m_inColor = inputView ));
}
void fFragmentShader ( vpp::FragmentShader* pShader )
{
using namespace vpp;
const Vec4 pix = SubpassLoad ( m_inColor );
m_outColor = pix;
}
private:
vpp::fragmentShader m_fragmentShader;
};

Accessing samplers

Samplers can be associated with images in static or dynamic way. Dynamic samplers are pipeline objects just as images themselves. They have binding points and must be bound to shader blocks.

Binding point type for samplers is vpp::inSampler. This is a template which takes either vpp::NormalizedSampler or vpp::UnnormalizedSampler type.

In shaders, the only thing to do with these samplers is to associate them with texture image represented by vpp::inTexture binding point. This is being done by calling the vpp::MakeSampledTexture() function, as in the example below. This function returns some opaque type that is allowed to be used in vpp::Texture and other texture reading functions.

vpp::MakeSampledTexture() must not be called within conditional blocks.

Example:

typedef vpp::format< float, float, float, float > FmtType;
typedef typename FmtType::data_type DataType;
FmtType, vpp::RENDER, vpp::IMG_TYPE_2D,
VK_IMAGE_TILING_OPTIMAL, VK_SAMPLE_COUNT_1_BIT,
false, false > TextureAttr;
typedef vpp::Image< TextureAttr > TextureImage;
class MyPipeline : public vpp::PipelineConfig
{
public:
// ...
void setBuffers (
const TextureView& tv,
const vpp::NormalizedSampler& nsamp,
vpp::ShaderDataBlock* pDataBlock );
{
pDataBlock->update ((
m_texture = tv,
m_normSampler = nsamp,
m_unnormSampler = usamp
));
}
void fFragmentShader ( vpp::FragmentShader* pShader );
{
using namespace vpp;
const Vec2 coords = pShader->inFragCoord [ XY ];
const auto normSampledTexture = MakeSampledTexture ( m_texture, m_normSampler );
const auto unnormSampledTexture = MakeSampledTexture ( m_texture, m_unnormSampler );
const Vec4 pixValue1 = Texture ( unnormSampledTexture, coords );
// do something with pixValue1...
const Vec2 is = StaticCast< Vec2 >( ImageSize ( d_texture ) );
const Vec2 scaledCoords = coords / is;
const Vec4 pixValue2 = Texture ( normSampledTexture, scaledCoords );
// do something with pixValue2...
}
private:
// ...
vpp::fragmentShader m_fragmentShader;
};

Another variant of samplers, "semi-dynamic", are represented with vpp::inConstSampler binding points. They use the same syntax as vpp::inSampler, but are not bound to shader data blocks. Instead, these binding points require corresponding samplers to be passed directly to the constructor.

Example:

class MyPipeline : public vpp::PipelineConfig
{
public:
// ...
MyPipeline :: MyPipeline (
const vpp::Process& pr,
const vpp::Device& dev,
const vpp::NormalizedSampler& nsamp,
const vpp::UnnormalizedSampler& usamp ) :
vpp::PipelineConfig ( pr ),
m_normSampler ( nsamp ),
m_unnormSampler ( usamp ),
m_fragmentShader ( this, & SampTestPipeline::fFragmentShader )
{
}
// ...
void fFragmentShader ( vpp::FragmentShader* pShader );
{
using namespace vpp;
const Vec2 coords = pShader->inFragCoord [ XY ];
const auto normSampledTexture = MakeSampledTexture ( m_texture, m_normSampler );
const auto unnormSampledTexture = MakeSampledTexture ( m_texture, m_unnormSampler );
const Vec4 pixValue1 = Texture ( unnormSampledTexture, coords );
// do something with pixValue1...
const Vec2 is = StaticCast< Vec2 >( ImageSize ( d_texture ) );
const Vec2 scaledCoords = coords / is;
const Vec4 pixValue2 = Texture ( normSampledTexture, scaledCoords );
// do something with pixValue2...
}
private:
// ...
vpp::fragmentShader m_fragmentShader;
};

Accessing output attachments

Output attachments are accessed though vpp::outAttachment binding points. There is only one thing you can do with them - write current pixel value. This is accomplished just by using the assignment (= ) operator. This syntax is different than most other resources, but simpler.

An example:

void fFragmentShader ( vpp::FragmentShader* pShader )
{
using namespace vpp;
Input< decltype ( m_ioNormal ) > inNormal ( m_ioNormal );
Input< decltype ( m_ioColor ) > inColor ( m_ioColor );
Input< decltype ( m_ioViewVec ) > inViewVec ( m_ioViewVec );
Input< decltype ( m_ioLightVec ) > inLightVec ( m_ioLightVec );
const Vec4 color = Texture ( m_colorMap, inUV );
const Vec3 vecN = Normalize ( inNormal );
const Vec3 vecL = Normalize ( inLightVec );
const Vec3 vecV = Normalize ( inViewVec );
const Vec3 vecR = Reflect ( -vecL, vecN );
const Vec3 diffuse = Max ( Dot ( vecN, vecL ), 0.0f ) * inColor;
const Vec3 specular = Pow ( Max ( Dot ( vecR, vecV ), 0.0f ), 16.0f ) * Vec3 ( 0.75f );
m_outColor = Vec4 ( diffuse * color [ XYZ ] + specular, 1.0f );
}

Accessing binding point arrays

Arrays of binding points are created by means of vpp::arrayOf template. Individual points are then referenced ny using extra bracket operator [], as shown in section Arrays of buffers.

Declaring and using local variables

Immutable variables

Immutable variables can be declared and initialized, but may not be changed later. On the other hand, they are very easily optimized and contribute to very efficient code. In most algorithms, there is a need for only a few variables that are mutable (e.g. loop control), the rest of them may be immutable.

And example below shows several immutable variables. Using const specifier is optional, they are immutable regardless you declare them const or not.

void fFragmentShader ( vpp::FragmentShader* pShader )
{
using namespace vpp;
// ...
const Vec4 color = Texture ( m_colorMap, inUV );
const Vec3 vecN = Normalize ( inNormal );
const Vec3 vecL = Normalize ( inLightVec );
const Vec3 vecV = Normalize ( inViewVec );
const Vec3 vecR = Reflect ( -vecL, vecN );
const Vec3 diffuse = Max ( Dot ( vecN, vecL ), 0.0f ) * inColor;
const Vec3 specular = Pow ( Max ( Dot ( vecR, vecV ), 0.0f ), 16.0f ) * Vec3 ( 0.75f );
// ...
}

Mutable variables and arrays

Mutable variables can be changed at will. But the price to pay for that is high. For each mutable variable, the shader compiler must allocate a permanent register on GPU and must do it for each concurrent thread. Typical GPU these days can run at least a thousand of threads simultaneously, and the register pool is several thousands (later or more expensive GPUs may have more). The register pool can be quickly exhausted by using more than 8-10 simple mutable variables per thread. When there is no more registers, the compiler will allocate regular memory and that will slow down your shader 10 times.

Therefore use immutable variables by default. Declare mutable ones only if needed. Reuse them throughout your shader. Note that C++ optimizing compiler will not be able to optimize usage of these variables, so you must do it yourself.

You can also declare mutable arrays of fixed size. This is done by means of vpp::VArray template. Specify item type as the first parameter and size as the second. All remarks about efficiency concern arrays as well - so declare only small arrays.

Some trivial examples:

VArray< Float, 6 > v;
VInt i = 0;
VInt j = 0;
i = 1;
j = 2;
v [ j ] = i;
v [ j + 1 ] = v [ j ] + 1;

Shared variables

The GPU runs large number of threads in parallel (1000 or more). Those threads are logically organized into workgroups. A workgroup is smaller group of threads (typically 32 or 64) that run on single Computation Unit on the GPU.

As of writing this (2018), contemporary GPU architectures became very similar to multicore SIMD CPU architectures, like regular Intel or AMD processors. A regular processor has e.g. 8 cores and 8-way SIMD in single core (AVX and AVX2). Now imagine a processor with 32-way SIMD and 32 cores – this is roughly an entry-level GPU. Each core also has its own data/instruction caches and GPR register pool. An unique feature of GPUs is allowing to control explicitly what variables are shared over entire core. These variables are called shared variables.

Shared variables have two distinguishing traits:

To declare a variable as shared inside shader code, use the vpp::Shared() function. Place it before the declaration, like this:

Shared(); VInt vi;
Shared(); VArray< Vec3, 256 > lightPositions;

Shared arrays are also possible and they are much less performance-sensitive. Actually shared arrays are very practical method for data exchange between threads in a single group, or temporary data storage when you can ensure that different threads do not access same fields simultaneously (otherwise consider using atomic variables, described in next section).

Atomic variables

Accessing built-in variables

Built-in variables are very important aspect of shader code authoring. These variables are predefined for each shader type and are accessible via the shader object given as the parameter to the shader. For example:

void fFragmentShader ( vpp::FragmentShader* pShader )
{
pShader-> ... // built-in variables are there
}

Each shader type has its own object representation: vpp::VertexShader, vpp::GeometryShader, vpp::TessControlShader, vpp::TessEvalShader, vpp::FragmentShader, and vpp::ComputeShader. Each of these objects define its own set of built-in variables, specific to that shader type. Shader objects also offer some methods (described elsewhere).

Built-in variables have different types. See the docs of each shader object for the list of variables and their types.

Some examples:

void MyPipeline :: fComputeShader ( vpp::ComputeShader* pShader )
{
using namespace vpp;
const IVec3 workgroupId = pShader->inWorkgroupId;
const IVec3 localId = pShader->inLocalInvocationId;
const Int g = workgroupId [ X ];
const Int l = localId [ X ];
// ...
}
void MyPipeline :: fVertexShader ( vpp::VertexShader* pShader )
{
using namespace vpp;
const Int vertexIndex = pShader->inVertexIndex;
const Int instanceIndex = pShader->inInstanceIndex;
// ...
}
void MyPipeline :: fFragmentShader ( vpp::FragmentShader* pShader )
{
using namespace vpp;
const Vec2 pixelCoords = pShader->inFragCoord [ XY ];
// ...
}

Indexing operators

Vectors and arrays are important part of programming, both CPU and GPU. VPP offers overloaded indexing operators ([]) allowing to access individual elements of various aggregates: local arrays, buffers, arrayed built-in variables, vectors, matrices, structures, etc. Subsequent sections describe them in more detail.

Local arrays

Local arrays are described in section Mutable variables and arrays. The indexing operator for them accepts any integer expression (variable or constant). Examples:

VInt vi;
Int i = 1;
VArray< Float, 6 > v;
v [ vi ] = v [ i ];
v [ i + 1 ] = v [ 0 ];
const int macroIndex = ...; // CPU computed expression
const Int t = v [ macroIndex ];

Vectors

Immutable or mutable vectors of types like vpp::Vec2, vpp::Vec3, vpp::Vec4, vpp::IVec2, vpp::IVec3, vpp::IVec4, vpp::VVec2, vpp::VVec3, vpp::VVec4, vpp::VIVec2, vpp::VIVec3, vpp::VIVec4, etc. can be indexed much like local arrays. Any integer expression can be used inside the [] operator. The index is zero based, like in arrays. The value must be less than vector size, otherwise the result is undefined. For mutable vectors, assignment to indexed locations is permitted.

Examples:

IVec2 ivec2 { 11, 13 };
Int v21 = ivec2 [ 1 ];
Int idx2 = ...; // some index
Int v22 = ivec2 [ idx2 ];
VVec4 vvec4;
VInt vidx;
For ( vidx, 0, 4 );
vvec4 [ vidx ] = StaticCast< Float >( vidx );
Rof();
Float v41 = vvec4 [ 1 ];
Int idx4 = ...; // some index
Float v44 = vvec4 [ idx4 ];

Vector swizzles

Swizzles are special method of indexing vectors, involving using of component names rather than numeric indices. Entire slices of vectors can be easily specified by these names, for example:

These are just a few examples. Any combination of letters X, Y, Z, W may be used. The length of the combination must be equal or less than the size of original vector. Swizzles can be used for any vector length and component type.

Swizzle names are defined as enumeration types: vpp::ESwizzle1, vpp::ESwizzle2, vpp::ESwizzle3, vpp::ESwizzle4. Because of that, they require either a using namespace vpp; directive, or explicit qualification with vpp:: prefix.

Swizzles can be combined with other indexing operators, but the swizzle either must be the last one of them in chain, or all indices coming after the swizzle must be constants or CPU variables.

Example:

template< vpp::ETag TAG >
struct TVertexAttr : public vpp::VertexStruct< TAG, TVertexAttr >
{
};
template< vpp::ETag TAG >
struct TFrameParams : public vpp::UniformStruct< TAG, TFrameParams >
{
};
void MyPipeline :: fVertexShader ( vpp::VertexShader* pShader )
{
using namespace vpp;
Output< decltype ( m_ioNormal ) > outNormal ( m_ioNormal );
Output< decltype ( m_ioColor ) > outColor ( m_ioColor );
Output< decltype ( m_ioViewVec ) > outViewVec ( m_ioViewVec );
Output< decltype ( m_ioLightVec ) > outLightVec ( m_ioLightVec );
// The following code uses swizzles to truncate vectors.
outColor = m_vertices [ & GVertexAttr::m_color ][ XYZ ];
outUV = m_vertices [ & GVertexAttr::m_uv ][ XY ];
const Mat4 proj = inFramePar [ & GFrameParams::m_projection ];
const Mat4 model = inFramePar [ & GFrameParams::m_model ];
const Mat3 model3 = Mat3 ( model );
const Vec4 pos = model * m_vertices [ & GVertexAttr::m_pos ];
pShader->outVertex.position = proj * pos;
outNormal = model3 * m_vertices [ & GVertexAttr::m_normal ][ XYZ ];
const Vec3 lPos = model3 * inFramePar [ & GFrameParams::m_lightPos ][ XYZ ];
outLightVec = lPos - pos [ XYZ ];
outViewVec = -pos [ XYZ ];
}

Swizzles can also be used to write components, i.e. on the left side of the assignment operator. Examples:

VVec4 vv4;
vv4 [ X ] = ...; // compute the value
vv4 [ Y ] = ...; // compute the value
vv4 [ Z ] = ...; // compute the value
vv4 [ W ] = ...; // compute the value
// this reverses the components
vv4 [ WZYX ] = vv4;

Matrices

Matrix variables can also be mutable or immutable. Some examples of immutable matrix types: vpp::Mat2, vpp::Mat3, vpp::Mat4, vpp::Mat4x2, vpp::Mat3x4, vpp::IMat4x2, vpp::UMat2x3, etc. Mutable versions have prefix V, e.g. vpp::VMat4, vpp::VIMat3x2, etc.

Matrices are indexed similarly to vectors. A matrix is equivalent to a vector of columns (vector of vectors), therefore the first indexing operator applied to it selects a column, and the second one selects an element within column.

IMat2 imat2
{
11, 12,
21, 22
};
Int a11 = imat2 [ 0 ][ 0 ]; // equal to 11
Int a12 = imat2 [ 1 ][ 0 ]; // equal to 12
Int a21 = imat2 [ 0 ][ 1 ]; // equal to 21
Int a22 = imat2 [ 1 ][ 1 ]; // equal to 22

The value of the index can be any integer expression (variable or constant).

If a matrix is indexed with only one indexing operator, the result is a column vector. For example:

IMat2 imat2
{
11, 12,
21, 22
};
// A vector of elements: { 11, 21 }
IVec2 imat20 = imat2 [ 0 ];
// A vector of elements: { 12, 22 }
IVec2 imat21 = imat2 [ 1 ];

For mutable types like vpp::VMat2 or vpp::VIMat2 indexing operators can be used on the left side of assignments. Examples:

IMat2 imat2 { 11, 12, 21, 22 };
// Sets vimat2 equal to imat2.
VIMat2 vimat2 = imat2;
// Changes elements of vimat2.
vimat2 [ 0 ][ 0 ] = 111;
vimat2 [ 1 ][ 0 ] = 112;
vimat2 [ 0 ][ 1 ] = 121;
vimat2 [ 1 ][ 1 ] = 122;
// Sets columns of vimat2.
vimat2 [ 0 ] = imat2 [ 0 ];
vimat2 [ 1 ] = imat2 [ 1 ];

Buffers containing single structure

This section concerns uniform or storage buffers which hold only single structure. This can be e.g. parameters and matrices for entire rendering frame.

Buffers like these are accessed by means of vpp::UniformVar accessor object. Use it as in the example below.

template< vpp::ETag TAG >
struct TFrameParams : public vpp::UniformStruct< TAG, TFrameParams >
{
};
typedef TFrameParams< vpp::GPU > GFrameParams;
typedef TFrameParams< vpp::CPU > CFrameParams;
class MyPipeline : public vpp::PipelineConfig
{
public:
void fVertexShader ( vpp::VertexShader* pShader )
{
using namespace vpp;
const Mat4 proj = inFramePar [ & GFrameParams::m_projection ];
const Mat4 model = inFramePar [ & GFrameParams::m_model ];
}
// ...
}

The accessor provides indexing operator allowing to access fields defined within vpp::UniformStruct definition. This operator can be used to read fields, as well as to write in case the buffer is vpp::ioBuffer.

The vpp::UniformVar accessor is also used with vpp::inPushConstant binding points, to access push constants. This access is read-only. From this perspective, push constants do not differ from uniform buffers. However, they are written on the CPU side in a different manner and are faster.

Buffers containing multiple structures

If there is an entire array of structures within the buffer, use vpp::UniformArray accesor instead of vpp::UniformVar. The binding point types are the same. This accessor defines and index operator taking integer index (any integer expression is allowed, variable or constant) and the result can be indexed with a field name.

Buffers containing multiple simple objects

There can be two major kinds of simple data buffers: regular uniform buffers and texel buffers.

Regular uniform buffers ???

vpp::UniformSimpleArray vpp::TexelArray

Arrays of buffers

Arrays of buffers (and images) are a different thing from buffers containing arrays. This time we have multiple buffers themselves. These buffer arrays are declared with the help of vpp::arrayOf template, like in the following code (shows examples for all supported arrays):

All of these arrays provide additional level of indexing when being accessed in the shader. Accepted indexes are any integer expressions, variable or constant.

The syntax differs slightly for buffers and images, because buffers use accessors and images do not. For buffers, you apply extra indexing to the accessor. Example:

Int idx = ...; // some index of buffer within the array, may be computed dynamically.
// Examples for reading arrayed buffer. The extra index goes first.
IVec4 q1 = varUniformBufferArr [ idx ][ & GFrameSelector::d_frameSelectorParams ];
Int q1x = varUniformBufferArr [ idx ][ & GFrameSelector::d_frameSelectorParams ][ X ];
// For comparison, the same for regular uniform buffer.
IVec4 p1 = varUniformBuffer [ & GFrameSelector::d_frameSelectorParams ];
Int p1x = varUniformBuffer [ & GFrameSelector::d_frameSelectorParams ][ X ];
// Works same way for writes:
varIoBufferArr [ idx ][ & GFrameSelector::d_frameSelectorParams ] = q3;
varIoBufferDynArr [ idx ][ & GFrameSelector::d_frameSelectorParams ] = q4;

For images, apply the indexing operator to the binding point name:

Int idx = ...; // some index of image within the array, may be computed dynamically.
// Reads a texel from one of the images within the array.
Vec4 s103 = Texture ( d_inSampledTextureArr [ idx ], c );
// For comparison, same thing for single image.
Vec4 s101 = Texture ( d_inSampledTexture, c );

There is one thing for images that might be confusing. There are two different methods of arraying images. The one shown above involves an array of separately bound images. Actually this is an array of independent binding points. A different vpp::Image or vpp::Img object may be bound to each item in the array. Use vpp::multi template to make such selective bindings.

The other kind of image arraying is to declare an image itself as arrayed (multilayered) so that it contains multiple layers. In such case this is single image, bound to single point. You specify layer index as an extra coordinate passed to functions like vpp::Texture.

These two methods works completely independently from each other and in fact may be mixed.

Vertex and instance buffers

Vertex and instance buffers are associated with structures defined by means of vpp::VertexStruct and vpp::InstanceStruct. Although they are physically arrays of structures, the vertex shader has access only to single, "current" element at any given moment. You apply the indexing operator directly to vpp::inVertexData binding point in the pipeline and provide a pointer to structure field, just like for single-element buffers (section Buffers containing single structure). An example:

// The binding point for vertices.
vpp::vertexShader m_vertexShader;
void MyPipeline :: fVertexShader ( vpp::VertexShader* pShader )
{
using namespace vpp;
Output< decltype ( m_ioNormal ) > outNormal ( m_ioNormal );
Output< decltype ( m_ioColor ) > outColor ( m_ioColor );
// Read some vertex attributes and pass them immediately to the next shader.
// We specify only the field index, because the shader has only access to single
// vertex. Actually for each vertex there is separate vertex shader call.
outColor = m_vertices [ & GVertexAttr::m_color ][ XYZ ];
outUV = m_vertices [ & GVertexAttr::m_uv ][ XY ];
// This reads data from single element buffer - note that the syntax is
// identical as for uniform buffers.
const Mat4 proj = inFramePar [ & GFrameParams::m_projection ];
const Mat4 model = inFramePar [ & GFrameParams::m_model ];
const Mat3 model3 = Mat3 ( model );
// Read vertex position, transform and write to the standard output
// built-in variable.
const Vec4 pos = model * m_vertices [ & GVertexAttr::m_pos ];
pShader->outVertex.position = proj * pos;
// Read vertex normal, transform and pass it to the next shader.
outNormal = model3 * m_vertices [ & GVertexAttr::m_normal ][ XYZ ];
// ...
}

Inter-shader variables of structural types

Basic control constructs

VPP shader language offers a lot of constructs to control flow of the execution. All of these have different syntax from corresponding C++ constructs, as C++ keywords like if or for are not overloadable.

Nevertheless, VPP and C++ statements may be mixed in code, which gives interesting possibilities. Generally, C++ constructs behave as a metalanguage to VPP constructs.

In sections below short descriptions and examples are shown for these control constructs. Refer to individial docs pages for more information.

'If/Else/Fi' conditionals

vpp::If() and vpp::Else() are counterparts of C++ if and else statements. Use as in the following example. Do not forget about vpp::Fi() at the end.

const Bool bCanAddCluster = ( activeClusters < 8 );
If ( bCanAddCluster );
{
clusterCenters [ activeClusters ] = dirVector;
thr = threshold;
d_partitionedSetOutput,
IVec3 ( i, lightIndex, zoneIndex ),
StaticCast< UInt >( j ) );
++activeClusters;
}
Else();
{
clusterCenters [ nearestCluster ] =
( dirVector + clusterCenters [ nearestCluster ] ) * Float ( 0.5f );
thr = halfThreshold;
d_partitionedSetOutput,
IVec3 ( i, lightIndex, zoneIndex ),
StaticCast< UInt >( nearestCluster ) );
}
Fi();

'Select' conditionals

vpp::Select() implements the conditional question mark operator (?: ) from C++. This operator is not overloadable, hence the need of separate construct. The order of arguments is the same as in the conditional operator.

const Bool bCondition = ( ... );
const Int valueIfTrue = ( ... );
const Int valueIfFalse = ( ... );
const Int j = Select ( bCondition, valueIfTrue, valueIfFalse );

'Do/While' loops

vpp::Do() and vpp::While() form basic looping construct. Always use them together and close the block with vpp::Od().

VArray< Float, 6 > v;
VInt i = 0;
VInt j = 0;
Do(); While ( i < 5 );
{
const Int i1 = i + 1;
const Float vi = v [ i ];
const Float vi1 = v [ i1 ];
const Bool bValid =
vi < vi1
&& evaluatePoly0 ( vi, k5, k4, k3, k2, k1, k0 ) < 0
&& evaluatePoly0 ( vi1, k5, k4, k3, k2, k1, k0 ) > 0;
v [ j ] = Select ( bValid, vi, v [ j ] );
v [ j + 1 ] = Select ( bValid, vi1, v [ j + 1 ] );
i = Select ( bValid, i + 2, i1 );
j = Select ( bValid, j + 2, j );
}
Od();

'For' loops

vpp::For() implements a simplified for loop. It takes 3 or 4 arguments. The first one is the control variable which must be already declared mutable variable of type vpp::Int or vpp::UInt. The second argument is the starting value. The third one is ending limit value, the loop will be repeated as long as the control variable is less than the ending value. Optional fourth argument is the step that will be added to the control variable in each loop turn. By default it is 1.

As with other constructs, vpp::For() has corresponding block closing instruction called vpp::Rof().

const Int nr = j >> 1;
For ( i, 0, nr );
{
const Int br = i + i;
const Int er = br + 1;
const Float vbr = v [ br ];
const Float ver = v [ er ];
// Find a root in [vbr, ver]. Initial guess is in v[i].
v [ i ] = ( vbr + ver )*0.5f;
For ( j, 0, 6 );
{
// Schroeder's root finding method.
// v [ i ] is the current parameter value.
const Float x = v [ i ];
const Float f0 = evaluatePoly0 ( x, k5, k4, k3, k2, k1, k0 );
const Float f1 = evaluatePoly1 ( x, k5, k4, k3, k2, k1 );
const Float f2 = evaluatePoly2 ( x, k5, k4, k3, k2 );
const Float r = f0 / f1;
const Float d = r + ( f2/(2.0f*f1) )*r*r;
v [ i ] = x - d;
}
Rof();
}
Rof();

'Switch' conditionals

vpp::Switch() construct is similar to the C++ one. Just as in C++ you need to use vpp::Break() to stop the control flow, otherwise fall-through behavior will occur.

const Int c = ...;
VInt x;
Switch ( c );
Case ( 0 );
x = doSomething0();
Break();
Case ( 1 );
x = doSomething1();
Break();
Case ( 2 );
x = doSomething2();
Break();
x = doSomethingOther();
Break();

Type conversions

vpp::StaticCast() and vpp::ReinterpretCast() are explicit type conversion operators.

vpp::StaticCast() converts data while preserving the value (or its approximation), just like in C++. You must use this operator somewhat more frequently in VPP than in C++, as VPP performs less implicit type conversions.

Example:

const Float f = ...;
const Int i = StaticCast< Int >( Ceil ( f ) );

vpp::ReinterpretCast() converts data while preserving binary representation, similar to C++. It is allowed to be used for numeric types only (no pointers). One particular application is to manipulate bits in IEEE-754 floats and doubles, for some fast approximations.

// Will be equal to IEEE-754 binary representation of 1.0
Int i = ReinterpretCast< Int >( Float ( 1.0f ) );

GPU-level functions

Those constructs allow to create functions in GPU code. These functions may be called then with various arguments. Function definitions should occur outside other constructs, preferably at the beginning of the shader.

To define a function, use vpp::Function template. The first argument is the return type. More optional arguments specify function argument types. Runtime string argument is the visible name of the function in SPIR-V dumps (useful for debugging).

Next, comes the vpp::Par declarations, one for each function argument. They are needed to access arguments in the function code. You can use these names in expressions. Function arguments are read-only. Default or variable number arguments are not supported.

vpp::Function and vpp::Par are class templates.

Next, between vpp::Begin() and vpp::End(), the function body is located. Note that vpp::Begin() and vpp::End() do not create C++ scope, so it is recommended to create it yourself by introducing curly braces pair. This is optional however, and may be omitted if your function is simple and does not create local variables.

Curly braces may be also moved to higher level, enlosing vpp::Par declaration as well, so that they do not pollute the main shader scope. This is shown in the second (binomial) function in the example below.

Calling functions is straightforward, syntax is the same as in C++.

Function< Int, Int > factorial ( "factorial" );
Par< Int > factX;
Begin();
{
VInt t = 0;
VInt r = 1;
For ( t, 2, factX+1 );
r *= t;
Rof();
Return ( r );
}
End();
Function< Int, Int, Int > binomial ( "binomial" );
{
Par< Int > n;
Par< Int > k;
Begin();
{
Return ( factorial ( n ) / ( factorial ( k )*factorial ( n-k ) ) );
}
End();
}
// ...
Int f = factorial ( 5 );
Int b = binomial ( 5, 2 );
// ...