In this chapter, we will show you how to use GPU to create a trail. The sample in this chapter is "GPU Based Trail" from https://github.com/IndieVisualLab/UnityGraphicsProgramming2 .
The trajectory of a moving object is called a trail. In a broad sense, it includes car ruts, ship tracks, ski spurs, etc., but what is impressive in CG is a light trail expression that draws a curve like a car tail lamp or a homing laser in a shooting game. ..
Two types of trails are provided as standard in Unity.
[*1] https://docs.unity3d.com/ja/current/Manual/class-TrailRenderer.html
[*2] https://docs.unity3d.com/Manual/PartSysTrailsModule.html
Since this chapter focuses on how to create the Trail itself, we will not use these functions, and by implementing it on the GPU, it will be possible to express more than the Trails module.
Figure 2.1: Sample code execution screen. Show 10000 Trails
Now let's create a Trail.
There are three main structures used.
GPUTrails.cs
public struct Trail { public int currentNodeIdx; }
Each Trail structure corresponds to one Trail. currentNodeIdx Stores the index of the last written Node buffer.
GPUTrails.cs
public struct Node { public float time; public Vector3 pos; }
Node structures are control points in the Trail. It stores the location of the Node and the time it was updated.
GPUTrails.cs
public struct Input { public Vector3 pos; }
The Input structure is the input for one frame from the emitter (the one that leaves the trajectory). Here, it's just the position, but I think it would be interesting to add colors and so on.
Initialize the buffer used by GPUTrails.Start ()
GPUTrails.cs
trailBuffer = new ComputeBuffer(trailNum, Marshal.SizeOf(typeof(Trail))); nodeBuffer = new ComputeBuffer(totalNodeNum, Marshal.SizeOf(typeof(Node))); inputBuffer = new ComputeBuffer(trailNum, Marshal.SizeOf(typeof(Input)));
Initializing trailBuffers for trailNum. In other words, this program processes multiple Trails at once. In nodeBuffer, Nodes for all Trails are handled together in one buffer. Indexes 0 to nodeNum-1 are the first, nodeNum to 2 * nodeNum-1 are the second, and so on. The inputBuffer also holds trailNums and manages the input of all trails.
GPUTrails.cs
var initTrail = new Trail() { currentNodeIdx = -1 }; var initNode = new Node() { time = -1 }; trailBuffer.SetData(Enumerable.Repeat(initTrail, trailNum).ToArray()); nodeBuffer.SetData(Enumerable.Repeat(initNode, totalNodeNum).ToArray());
The initial value is put in each buffer. Set Trail.currentNodeIdx and Node.time to negative numbers, and use them later to determine whether they are unused. Since all values of inputBuffer are written in the first update, there is no need to initialize and there is no touch.
Here's how to use the Node buffer.
Figure 2.2: Initial state
Nothing has been entered yet.
Figure 2.3: Input
It will be input one node at a time. I have an unused Node.
Figure 2.4: Loop
When all the Nodes are exhausted, the returning Nodes will be overwritten at the beginning. It is used like a ring buffer.
From here, it will be called every frame. Enter the position of the emitter to add and update Nodes.
First, update the inputBuffer externally. This can be any process. At first ComputeBuffer.SetData()
, it may be easier and better to calculate with the CPU . The sample code moves particles in a simple GPU implementation and treats them as emitters.
The particles in the sample code move in search of the force received by Curl Noise. As you can see, Curl Noise is very convenient because you can easily create pseudo-fluid-like movements. Of this book - "Neuss' s Arco 's squirrel' s-time commentary for the pseudo-fluid Curl Noise" Chapter 6 in @sakope See all means because I have been described in detail.
GPUTrailParticles.cs
void Update() { cs.SetInt(CSPARAM.PARTICLE_NUM, particleNum); cs.SetFloat(CSPARAM.TIME, Time.time); cs.SetFloat(CSPARAM.TIME_SCALE, _timeScale); cs.SetFloat(CSPARAM.POSITION_SCALE, _positionScale); cs.SetFloat(CSPARAM.NOISE_SCALE, _noiseScale); var kernelUpdate = cs.FindKernel(CSPARAM.UPDATE); cs.SetBuffer(kernelUpdate, CSPARAM.PARTICLE_BUFFER_WRITE, _particleBuffer); var updateThureadNum = new Vector3(particleNum, 1f, 1f); ComputeShaderUtil.Dispatch(cs, kernelUpdate, updateThureadNum); var kernelInput = cs.FindKernel(CSPARAM.WRITE_TO_INPUT); cs.SetBuffer(kernelInput, CSPARAM.PARTICLE_BUFFER_READ, _particleBuffer); cs.SetBuffer(kernelInput, CSPARAM.INPUT_BUFFER, trails.inputBuffer); var inputThreadNum = new Vector3(particleNum, 1f, 1f); ComputeShaderUtil.Dispatch(cs, kernelInput, inputThreadNum); }
I'm running two kernels.
Now, let's update nodeBuffer by referring to inputBuffer.
GPUTrailParticles.cs
void LateUpdate() { cs.SetFloat(CSPARAM.TIME, Time.time); cs.SetFloat(CSPARAM.UPDATE_DISTANCE_MIN, updateDistaceMin); cs.SetInt(CSPARAM.TRAIL_NUM, trailNum); cs.SetInt(CSPARAM.NODE_NUM_PER_TRAIL, nodeNum); var kernel = cs.FindKernel(CSPARAM.CALC_INPUT); cs.SetBuffer(kernel, CSPARAM.TRAIL_BUFFER, trailBuffer); cs.SetBuffer(kernel, CSPARAM.NODE_BUFFER, nodeBuffer); cs.SetBuffer(kernel, CSPARAM.INPUT_BUFFER, inputBuffer); ComputeShaderUtil.Dispatch(cs, kernel, new Vector3(trailNum, 1f, 1f)); }
On the CPU side, all you have to do is Dispatch () ComputeShader, passing the required parameters. The processing on the main ComputeShader side is as follows.
GPUTrail.compute
[numthreads(256,1,1)] void CalcInput (uint3 id : SV_DispatchThreadID) { uint trailIdx = id.x; if ( trailIdx < _TrailNum) { Trail trail = _TrailBuffer[trailIdx]; Input input = _InputBuffer[trailIdx]; int currentNodeIdx = trail.currentNodeIdx; bool update = true; if ( trail.currentNodeIdx >= 0 ) { Node node = GetNode(trailIdx, currentNodeIdx); float dist = distance(input.position, node.position); update = dist > _UpdateDistanceMin; } if ( update ) { Node node; node.time = _Time; node.position = input.position; currentNodeIdx++; currentNodeIdx %= _NodeNumPerTrail; // write new node SetNode(node, trailIdx, currentNodeIdx); // update trail trail.currentNodeIdx = currentNodeIdx; _TrailBuffer[trailIdx] = trail; } } }
Let's take a closer look.
uint trailIdx = id.x; if ( trailIdx < _TrailNum)
First, I'm using the argument id as the Trail index. Due to the number of threads, it may be called with ids equal to or greater than the number of Trails, so I play something outside the range with an if statement.
int currentNodeIdx = trail.currentNodeIdx; bool update = true; if ( trail.currentNodeIdx >= 0 ) { Node node = GetNode(trailIdx, currentNodeIdx); update = distance(input.position, node.position) > _UpdateDistanceMin; }
Trail.currentNodeIdx
I am checking next . If it is negative, it is an unused Trail.
GetNode()
Is a function that gets the specified Node from _NodeBuffer. Since the index calculation is the source of mistakes, it is functionalized.
The Trail, which is already in use, compares the distance between the latest Node and the input position and states that _UpdateDistanceMin
it will be updated if it is farther away and will not be updated if it is closer. Although it depends on the behavior of the emitter, the input at almost the same position as the previous Node is usually in a state of being almost stopped and moving with a slight error, so if you try to generate a Trail by converting these into Nodes in a lawful manner, between consecutive Nodes The direction is very different and it is often quite dirty. Therefore, at a very short distance, I dare to skip without adding Node.
GPUTrail.compute
if ( update ) { Node node; node.time = _Time; node.position = input.position; currentNodeIdx++; currentNodeIdx %= _NodeNumPerTrail; // write new node SetNode(node, trailIdx, currentNodeIdx); // update trail trail.currentNodeIdx = currentNodeIdx; _TrailBuffer[trailIdx] = trail; }
Finally, I'm updating _NodeBuffer and _TrailBuffer. The Trail stores the index of the entered Node as currentNodeIdx. When the number of Nodes per Trail is exceeded, it is returned to zero so that it becomes a ring buffer. Node stores the time and position of the input.
Well, this completes the logical processing of Trail. Next, let's look at the process of drawing from this information.
Drawing a Trail is basically a process of connecting Nodes with a line. Here, I will try to keep the individual trails as simple as possible and focus on quantity. Therefore, we want to reduce the number of polygons as much as possible, so we will generate the line as a plate polygon facing the camera.
The method to generate the plate polygon facing the camera is as follows.
Figure 2.5: Node column
From a Node column like this
Figure 2.6: Vertices generated from Node
Finds the vertices that are moved from each node by the specified width in the direction perpendicular to the line of sight.
Figure 2.7: Polygonization
Connect the generated vertices to make a polygon. Let's take a look at the actual code.
On the CPU side, the process is simply to pass the parameters to the material and perform DrawProcedual ().
GPUTrailRenderer.cs
void OnRenderObject () { _material.SetInt(GPUTrails.CSPARAM.NODE_NUM_PER_TRAIL, trails.nodeNum); _material.SetFloat(GPUTrails.CSPARAM.LIFE, trails._life); _material.SetBuffer(GPUTrails.CSPARAM.TRAIL_BUFFER, trails.trailBuffer); _material.SetBuffer(GPUTrails.CSPARAM.NODE_BUFFER, trails.nodeBuffer); _material.SetPass(0); var nodeNum = trails.nodeNum; var trailNum = trails.trailNum; Graphics.DrawProcedural(MeshTopology.Points, nodeNum, trailNum); }
Parameters trails._life
that have not appeared until now have appeared. This is used for processing that compares the lifetime of the Node with the generation time that the Node itself has, and makes it transparent after this amount of time. By doing this, you can express that the end of the trail disappears smoothly.
Since there are no meshes or polygons to input, Graphics.DrawProcedural()
we issue a command to draw a model with trails.nodeNum vertices in batches of trails.trailNum instances.
GPUTrails.shader
vs_out vert (uint id : SV_VertexID, uint instanceId : SV_InstanceID) { vs_out Out; Trail trail = _TrailBuffer[instanceId]; int currentNodeIdx = trail.currentNodeIdx; Node node0 = GetNode(instanceId, id-1); Node node1 = GetNode(instanceId, id); // current Node node2 = GetNode(instanceId, id+1); Node node3 = GetNode(instanceId, id+2); bool isLastNode = (currentNodeIdx == (int)id); if ( isLastNode || !IsValid(node1)) { node0 = node1 = node2 = node3 = GetNode(instanceId, currentNodeIdx); } float3 pos1 = node1.position; float3 pos0 = IsValid(node0) ? node0.position : pos1; float3 pos2 = IsValid(node2) ? node2.position : pos1; float3 pos3 = IsValid(node3) ? node3.position : pos2; Out.pos = float4(pos1, 1); Out.posNext = float4(pos2, 1); Out.dir = normalize(pos2 - pos0); Out.dirNext = normalize(pos3 - pos1); float ageRate = saturate((_Time.y - node1.time) / _Life); float ageRateNext = saturate((_Time.y - node2.time) / _Life); Out.col = lerp(_StartColor, _EndColor, ageRate); Out.colNext = lerp(_StartColor, _EndColor, ageRateNext); return Out; }
First is the processing of vertex shader. Outputs information about the current Node and the next Node corresponding to this thread.
GPUTrails.shader
Node node0 = GetNode(instanceId, id-1); Node node1 = GetNode(instanceId, id); // current Node node2 = GetNode(instanceId, id+1); Node node3 = GetNode(instanceId, id+2);
The current node is set to node1, and a total of four nodes are referenced, including the previous node0, the previous node2, and the second node3.
GPUTrails.shader
bool isLastNode = (currentNodeIdx == (int)id); if ( isLastNode || !IsValid(node1)) { node0 = node1 = node2 = node3 = GetNode(instanceId, currentNodeIdx); }
If the current Node is terminal or has not yet been entered, treat nodes 0-3 as a copy of the terminal Node. In other words, all Nodes beyond the end that have no information yet are treated as "folded" to the end. By doing this, it can be sent as it is to the subsequent polygon generation processing.
GPUTrails.shader
float3 pos1 = node1.position; float3 pos0 = IsValid(node0) ? node0.position : pos1; float3 pos2 = IsValid(node2) ? node2.position : pos1; float3 pos3 = IsValid(node3) ? node3.position : pos2; Out.pos = float4(pos1, 1); Out.posNext = float4(pos2, 1);
Now, extract the location information from the four Nodes. Please note that all but the current Node (node1) may be blank. It may be a little surprising that node0 is not entered, but this is possible because node0 points to the last node in the buffer going back in the ring buffer when currentNodeIdx == 0. Again, copy the location of node1 to fold it to the same location. The same applies to nodes2 and 3. Of these, pos1 and pos2 are output toward the geometry shader.
GPUTrails.shader
Out.dir = normalize(pos2 - pos0); Out.dirNext = normalize(pos3 - pos1);
Furthermore, the direction vector of pos0 → pos2 is output as the tangent at pos1, and the direction vector of pos1 → pos3 is output as the tangent at pos2.
GPUTrails.shader
float ageRate = saturate((_Time.y - node1.time) / _Life); float ageRateNext = saturate((_Time.y - node2.time) / _Life); Out.col = lerp(_StartColor, _EndColor, ageRate); Out.colNext = lerp(_StartColor, _EndColor, ageRateNext);
Finally, the color is calculated by comparing the write time of node1 and node2 with the current time.
GPUTrails.shader
[maxvertexcount(4)] void geom (point vs_out input[1], inout TriangleStream<gs_out> outStream) { gs_out output0, output1, output2, output3; float3 pos = input[0].pos; float3 dir = input [0] .dir; float3 posNext = input[0].posNext; float3 dirNext = input[0].dirNext; float3 camPos = _WorldSpaceCameraPos; float3 toCamDir = normalize(camPos - pos); float3 sideDir = normalize(cross(toCamDir, dir)); float3 toCamDirNext = normalize(camPos - posNext); float3 sideDirNext = normalize(cross(toCamDirNext, dirNext)); float width = _Width * 0.5; output0.pos = UnityWorldToClipPos(pos + (sideDir * width)); output1.pos = UnityWorldToClipPos(pos - (sideDir * width)); output2.pos = UnityWorldToClipPos(posNext + (sideDirNext * width)); output3.pos = UnityWorldToClipPos(posNext - (sideDirNext * width)); output0.col = output1.col = input[0].col; output2.col = output3.col = input[0].colNext; outStream.Append (output0); outStream.Append (output1); outStream.Append (output2); outStream.Append (output3); outStream.RestartStrip(); }
Next is the processing of geometry shader. The polygon is finally generated from the information for two Nodes passed from the vertex shader. From 2 pos and dir, 4 positions = quadrangle are obtained and output as TriangleStream.
GPUTrails.shader
float3 camPos = _WorldSpaceCameraPos; float3 toCamDir = normalize(camPos - pos); float3 sideDir = normalize(cross(toCamDir, dir));
The outer product of the direction vector (toCameraDir) from pos to the camera and the tangent vector (dir) is obtained, and this is set as the width of the line (sideDir).
GPUTrails.shader
output0.pos = UnityWorldToClipPos(pos + (sideDir * width)); output1.pos = UnityWorldToClipPos(pos - (sideDir * width));
Find the vertices that have moved in the positive and negative sideDir directions. Here, we have completed the coordinate transformation to make it a Clip coordinate system and pass it to the fragment shader. By performing the same processing for posNext, a total of four vertices were obtained.
GPUTrails.shader
output0.col = output1.col = input[0].col; output2.col = output3.col = input[0].colNext;
Add color to each vertex to complete.
GPUTrails.shader
fixed4 frag (gs_out In) : COLOR { return In.col; }
Finally, the fragment shader. It's as simple as it gets. It just outputs the color (laughs)
I think that the Trail has been generated. This time, the processing was only for colors, but I think that it can be applied in various ways, such as adding textures and changing the width. Also, as the source code is separated from GPUTrails.cs and GPUTRailsRenderer.cs, the GPUTrails.shader side is just a process of drawing by looking at the buffer, so if you prepare _TrailBuffer and _NodeBuffer, it is not limited to Trail but actually line-shaped. It can be used for display. This time it was just a trail added to _NodeBuffer, but I think that by updating all Nodes every frame, it is possible to express something like a tentacle.
This chapter has provided the simplest possible example of Trail's GPU implementation. While debugging becomes difficult with the GPU, it enables overwhelming physical expression that cannot be done with the CPU. I hope that as many people as possible can experience that "Uhyo!" Feeling through this book. Also, I think Trail is an expression of an interesting area with a wide range of applications, such as "displaying a model" and "drawing with an algorithm in screen space". I think that the understanding gained in this process will be useful when programming various video expressions, not limited to Trail.