Unity 之 Profiler概述
Profiler overview
Unity 官方说明文档及翻译:
The Unity Profiler Window helps you to optimize your game. It reports for you how much time is spent in the various areas of your game. For example, it can report the percentage of time spent rendering, animating or in your game logic.
Unity Profiler窗口可以帮助你优化你的游戏。它会为你报告你在游戏的各个领域花费了多少时间。例如,它可以报告花费在呈现上的时间百分比
,动画或在你的游戏逻辑。
You can analyze the performance of the GPU, CPU, memory, rendering, and audio.
您可以分析GPU、CPU、内存、呈现和音频的性能。
To see the profiling data, you play your game in the Editor with Profiling on, and it records performance data. The Profiler window then displays the data in a timeline, so you can see the frames or areas that spike (take more time) than others. By clicking anywhere in the timeline, the bottom section of the Profiler window will display detailed information for the selected frame.
Note that profiling has to instrument your code (that is; add some instructions to facilitate the check). While this has a small impact on the performance of your game, the overhead is small enough to not affect the game framerate.
要查看性能分析数据,您可以在带有性能分析的编辑器中玩游戏,它会记录性能数据。然后,分析器窗口在时间轴中显示数据
因此,你可以看到帧或区域的峰值(需要更多的时间)比其他。通过单击时间轴中的任何地方,Profiler窗口的底部部分将显示所选框架的详细信息。
注意,分析必须测试您的代码(即;增加一些说明以方便检查)。虽然这对游戏的性能影响很小,但开销却小到足以不影响游戏的帧率。
Tips on using the Tool
When using the profiling tool, focus on those parts of the game that consume the most time. Compare profiling results before and after code changes and determine the improvements you measure. Sometimes changes you make to improve performance might have a negative effect on frame rate; there may be unexpected consequences of your code optimization.
See Profiler window documentation for details of the Profiler window.
See also: Optimizing Graphics Performance.
使用该工具的提示
在使用分析工具时,将注意力集中在游戏中消耗时间最多的部分。比较代码更改前后的分析结果,并确定度量的改进。有时您为提高性能所做的更改可能会对帧速率产生负面影响;您的代码优化可能会产生意想不到的结果。
有关Profiler窗口的详细信息,请参阅Profiler窗口文档。
参见:优化图形性能。
Profiler窗口
通过工具栏访问Unity编辑器中的Profiler窗口:Window > Profiler。(默认快捷键Ctrl+7)
长这个样子:
Profiler控件位于窗口顶部的工具栏中。使用这些来打开和关闭仿形,并通过异形框架导航。传输控件位于工具栏的最右端。注意,当游戏运行时,探查器正在收集数据,点击其中的任何一个传输控件暂停游戏。
Optimizing graphics performance 优化图形性能
Good performance is critical to the success of many games. Below are some simple guidelines for maximizing the speed of your game’s rendering.
优化图形性能,良好的性能是许多游戏成功的关键。以下是一些简单的指导方针,以最大限度地提高你的游戏的渲染速度。
Locate high graphics impact
The graphical parts of your game can primarily impact on two systems of the computer: the GPU and the CPU. The first rule of any optimization is to find where the performance problem is, because strategies for optimizing for GPU vs. CPU are quite different (and can even be opposite - for example, it’s quite common to make the GPU do more work while optimizing for CPU, and vice versa).
定位高图形影响
游戏的图形部分主要影响计算机的两个系统:GPU和CPU。任何优化的首要规则都是找出性能问题所在,因为GPU和CPU的优化策略是完全不同的(甚至可能是相反的——例如,在优化CPU时让GPU做更多的工作是很常见的,反之亦然)。
Common bottlenecks and ways to check for them:
常见瓶颈及其检查方法:
GPU is often limited by fillrate or memory bandwidth.
Lower the display resolution and run the game. If a lower display resolution makes the game run faster, you may be limited by fillrate on the GPU.
CPU is often limited by the number of batches that need to be rendered.
Check “batches” in the Rendering Statistics window. The more batches are being rendered, the higher the cost to the CPU.
Less-common bottlenecks:
GPU常常受到填充率或内存带宽的限制。降低显示分辨率并运行游戏。如果较低的显示分辨率使游戏运行得更快,您可能会受到GPU上填充率的限制。CPU常常受到需要呈现的批处理数量的限制。检查呈现统计窗口中的“批”。批处理越多,CPU的成本就越高。较少-共同的瓶颈:
The GPU has too many vertices to process. The number of vertices that is acceptable to ensure good performance depends on the GPU and the complexity of vertex shaders. Generally speaking, aim for no more than 100,000 vertices on mobile. A PC manages well even with several million vertices, but it is still good practice to keep this number as low as possible through optimization.
The CPU has too many vertices to process. This could be in skinned meshes, cloth simulation, particles, or other game objects and meshes. As above, it is generally good practice to keep this number as low as possible without compromising game quality. See the section on CPU optimization below for guidance on how to do this.
If rendering is not a problem on the GPU or the CPU, there may be an issue elsewhere - for example, in your script or physics. Use the Unity Profiler to locate the problem.
GPU有太多的顶点不能处理。能够保证良好性能的顶点数目取决于GPU和顶点着色器的复杂性。一般来说,目标在移动平台上的顶点不超过10万个。即使有几百万个顶点,PC仍能很好地管理,但通过优化尽可能保持这个数目的低仍然是很好的做法。CPU有太多的顶点不能处理。这可以是在蒙皮网格,布模拟,粒子,或其他游戏对象和网格。如前所述,在不影响游戏质量的情况下,保持这个数字尽可能低通常是很好的做法。有关如何进行此操作的指导,请参阅下面关于CPU优化的部分。如果渲染不是GPU或CPU上的问题,那么在其他地方可能会有问题-例如,在您的脚本或物理中。使用UnityProfiler查找问题。
CPU optimization
To render objects on the screen, the CPU has a lot of processing work to do: working out which lights affect that object, setting up the shader and shader parameters, and sending drawing commands to the graphics driver, which then prepares the commands to be sent off to the graphics card.
CPU优化以在屏幕上呈现对象,CPU有大量的处理工作要做:确定影响该对象的灯光,设置着色器和着色器参数,并向图形驱动程序发送绘图命令,然后该驱动程序准备发送到图形卡的命令。
All this “per object” CPU usage is resource-intensive, so if you have lots of visible objects, it can add up. For example, if you have a thousand triangles, it is much easier on the CPU if they are all in one mesh, rather than in one mesh per triangle (adding up to 1000 meshes). The cost of both scenarios on the GPU is very similar, but the work done by the CPU to render a thousand objects (instead of one) is significantly higher.
所有这些“每个对象”的CPU使用都是资源密集型的,所以如果您有很多可见对象,就可以将它们加起来。例如,如果您有1000个三角形,那么如果它们都在一个网格中,而不是每个三角形在一个网格中(总共1000个网格),那么在CPU上就容易得多。在GPU上,这两种场景的成本非常相似,但是CPU渲染1000个对象(而不是一个)所做的工作要高得多。
Reduce the visible object count. To reduce the amount of work the CPU needs to do:
减少可见对象数。为了减少cpu需要做的工作量
Combine close objects together, either manually or using Unity’s draw call batching.
Use fewer materials in your objects by putting separate textures into a larger texture atlas.
Use fewer things that cause objects to be rendered multiple times (such as reflections, shadows and per-pixel lights).
Combine objects together so that each mesh has at least several hundred triangles and uses only one Material for the entire mesh. Note that combining two objects which don’t share a material does not give you any performance increase at all. The most common reason for requiring multiple materials is that two meshes don’t share the same textures; to optimize CPU performance, ensure that any objects you combine share the same textures.
将关闭对象组合在一起,或者手动地或者使用联合的绘图调用批处理。在你的对象中使用更少的材料,把不同的纹理放到一个更大的纹理图谱中。使用较少的东西,使对象被呈现多次(如反射,阴影和每像素灯)。将物体组合在一起,使每个网格至少有几百个三角形,并且整个网格只使用一种材料。请注意,将两个不共享材料的对象组合在一起并不能提高性能。需要多个材料的最常见的原因是两个网格不共享相同的纹理;要优化CPU性能,确保组合的任何对象共享相同的纹理。
When using many pixel lights in the Forward rendering path, there are situations where combining objects may not make sense. See the Lighting performance section below to learn how to manage this.
在前向渲染路径中使用多个像素灯时,在某些情况下组合对象可能没有意义。请参阅下面的“照明性能”部分,以了解如何管理该功能。
GPU: Optimizing model geometry
There are two basic rules for optimizing the geometry of a model:
GPU:优化模型几何有两个基本规则来优化模型的几何形状:
Don’t use any more triangles than necessary
Try to keep the number of UV mapping seams and hard edges (doubled-up vertices) as low as possible
Note that the actual number of vertices that graphics hardware has to process is usually not the same as the number reported by a 3D application. Modeling applications usually display the number of distinct corner points that make up a model (known as the geometric vertex count). For a graphics card, however, some geometric vertices need to be split into two or more logical vertices for rendering purposes. A vertex must be split if it has multiple normals, UV coordinates or vertex colors. Consequently, the vertex count in Unity is usually higher than the count given by the 3D application.
不要使用任何必要的三角形,尽量保持UV映射接缝和硬边的数量(加倍顶点),注意图形硬件必须处理的实际顶点数通常与3D应用程序报告的数目不同。建模应用程序通常显示构成模型的不同角点的数量(称为几何顶点计数)。然而,对于显卡来说,为了渲染的目的,需要将一些几何顶点分割成两个或多个逻辑顶点。如果顶点具有多个法线、UV坐标或顶点颜色,则必须将其拆分。因此,统一中的顶点计数通常高于3D应用程序所给出的计数。
While the amount of geometry in the models is mostly relevant for the GPU, some features in Unity also process models on the CPU (for example, mesh skinning).
虽然模型中的几何量主要与GPU有关,但有些Unity中的特性也在CPU上处理模型(例如,网格蒙皮)。
Lighting performance
The fastest option is always to create lighting that doesn’t need to be computed at all. To do this, use Lightmapping to “bake” static lighting just once, instead of computing it each frame. The process of generating a lightmapped environment takes only a little longer than just placing a light in the scene in Unity, but:
照明性能最快的选择总是创建照明,根本不需要计算。要做到这一点,使用Lightmap来“烘焙”静态照明一次,而不是计算每个帧。生成光映射环境的过程只需稍长一点,而不只是在“Unitity”中的场景中放置一盏灯,但是:
It runs a lot faster (2–3 times faster for 2-per-pixel lights)
It looks a lot better, as you can bake global illumination and the lightmapper can smooth the results
In many cases you can apply simple tricks instead of adding multiple extra lights. For example, instead of adding a light that shines straight into the camera to give a Rim Lighting effect, add a dedicated Rim Lighting computation directly into your shaders (see Surface Shader Examples to learn how to do this).
它运行得更快(2像素灯的运行速度是2到3倍),它看起来好多了,因为你可以烘焙全局照明,而且在很多情况下,光映射器可以平滑结果,你可以应用简单的技巧而不是添加多个额外的灯光。例如,不要添加直接射入相机的光线以获得边缘照明效果,而是将专用的边缘照明计算直接添加到着色器中(请参阅表面着色示例以了解如何做到这一点)。
Lights in forward rendering
Also see: Forward rendering
前向渲染中的灯也见:正演渲染。
Per-pixel dynamic lighting adds significant rendering work to every affected pixel, and can lead to objects being rendered in multiple passes. Avoid having more than one Pixel Light illuminating any single object on less powerful devices, like mobile or low-end PC GPUs, and use lightmaps to light static objects instead of calculating their lighting every frame. Per-vertex dynamic lighting can add significant work to vertex transformations, so try to avoid situations where multiple lights illuminate a single object.
每个像素的动态照明为每个受影响的像素增加了重要的渲染工作,并可能导致对象在多次传递中呈现。避免在不太强大的设备(如移动或低端PC GPU)上照明任何单一物体,并使用光图来照明静态物体,而不是计算它们的每一帧照明。每个顶点的动态照明可以为顶点转换增加重要的工作,因此尽量避免多个灯照亮单个对象的情况。
Avoid combining meshes that are far enough apart to be affected by different sets of pixel lights. When you use pixel lighting, each mesh has to be rendered as many times as there are pixel lights illuminating it. If you combine two meshes that are very far apart, it increase the effective size of the combined object. All pixel lights that illuminate any part of this combined object are taken into account during rendering, so the number of rendering passes that need to be made could be increased. Generally, the number of passes that must be made to render the combined object is the sum of the number of passes for each of the separate objects, so nothing is gained by combining meshes.
避免合并的网格是足够远的距离,以影响不同的集合像素灯。当你使用像素照明,每个网格必须渲染的次数,因为有像素灯照亮它。如果将两个非常相距很远的网格组合在一起,则会增加组合对象的有效大小。在渲染过程中,所有照亮这个组合对象的任何部分的像素灯都会被考虑在内,因此需要进行渲染的通道的数量可以增加。通常,呈现合并对象时必须进行的传递数是每个单独对象的传递次数之和,因此合并网格不会获得任何结果。
During rendering, Unity finds all lights surrounding a mesh and calculates which of those lights affect it most. The Quality Settings are used to modify how many of the lights end up as pixel lights, and how many as vertex lights. Each light calculates its importance based on how far away it is from the mesh and how intense its illumination is - and some lights are more important than others purely from the game context. For this reason, every light has a Render Mode setting which can be set to Important or Not Important; lights marked as Not Important have a lower rendering overhead.
在渲染过程中,统一会发现网格周围的所有灯光,并计算出哪些光线对其影响最大。“质量设置”用于修改最终作为像素灯的灯数和作为顶点灯的灯数。每一盏灯的重要性都是根据它离网状网有多远,以及它的照明强度来计算的-有些灯光比其他光更重要,这完全取决于游戏背景。因此,每个灯都有一个渲染模式设置,可以设置为重要或不重要;标记为不重要的灯具有较低的渲染开销。
Example: Consider a driving game in which the player’s car is driving in the dark with headlights switched on. The headlights are probably the most visually significant light source in the game, so their Render Mode should be set to Important. There may be other lights in the game that are less important, like other cars’ rear lights or distant lampposts, and which don’t improve the visual effect much by being pixel lights. The Render Mode for such lights can safely be set to Not Important to avoid wasting rendering capacity in places where it has little benefit.
举个例子:在一个驾驶游戏中,玩家的车开着前灯在黑暗中行驶。前灯可能是最有视觉意义的光源在游戏中,所以他们的渲染模式应该设置为重要。游戏中可能会有一些不太重要的灯光,比如其他汽车的后灯或远距离的灯柱,而这些灯光并不能通过像素灯来改善视觉效果。这种灯的渲染模式可以安全地设置为不重要,以避免浪费渲染能力的地方,它没有什么好处。
Optimizing per-pixel lighting saves both the CPU and GPU work: the CPU has fewer draw calls to do, and the GPU has fewer vertices to process and pixels to rasterize for all the additional object renders.
优化每像素照明节省了CPU和GPU的工作:CPU有较少的绘制调用要做,而GPU有更少的顶点需要处理,像素为所有额外的对象渲染的栅格化。
GPU: Texture compression and mipmaps
Use Compressed textures to decrease the size of your textures. This can result in faster load times, a smaller memory footprint, and dramatically increased rendering performance. Compressed textures only use a fraction of the memory bandwidth needed for uncompressed 32-bit RGBA textures.
GPU:纹理压缩和mipmap
使用压缩纹理来减少纹理的大小。这可以导致更快的加载时间、更小的内存占用和显著提高呈现性能。压缩纹理仅使用未压缩32位RGBA纹理所需内存带宽的一小部分。
Texture mipmaps
Always enable Generate mipmaps for textures used in a 3D scene. A mipmap texture enables the GPU to use a lower resolution texture for smaller triangles.This is similar to how texture compression can help limit the amount of texture data transfered when the GPU is rendering.
The only exception to this rule is when a texel (texture pixel) is known to map 1:1 to the rendered screen pixel, as with UI elements or in a 2D game.
纹理mipmap总是能够为3D场景中使用的纹理生成mipmap。mipmap纹理使GPU能够对较小的三角形使用较低分辨率的纹理,这类似于纹理压缩可以帮助限制GPU渲染时传输的纹理数据量。这个规则的唯一例外是,当一个纹理像素(纹理像素)已知映射1:1到呈现的屏幕像素时,就像UI元素或2D游戏中的那样。
LOD and per-layer cull distances
Culling objects involves making objects invisible. This is an effective way to reduce both the CPU and GPU load.
LOD和每层精选距离
剔除对象包括使对象不可见。这是降低CPU和GPU负载的有效方法
In many games, a quick and effective way to do this without compromising the player experience is to cull small objects more aggressively than large ones. For example, small rocks and debris could be made invisible at long distances, while large buildings would still be visible.
In many games, a quick and effective way to do this without compromising the player experience is to cull small objects more aggressively than large ones. For example, small rocks and debris could be made invisible at long distances, while large buildings would still be visible.
There are a number of ways you can achieve this:
Use the Level Of Detail system
Manually set per-layer culling distances on the camera
Put small objects into a separate layer and set up per-layer cull distances using the Camera.layerCullDistances script function
您可以通过多种方法实现这一点:使用详细级别系统手动设置摄像机上的每一层剔除距离,将小对象放置在单独的一层中,并使用Camera.layerCullfar脚本函数设置每一层的裁剪距离。
Realtime shadows
Realtime shadows are nice, but they can have a high impact on performance, both in terms of extra draw calls for the CPU and extra processing on the GPU. For further details, see the Light Performance page.
实时阴影是不错的,但它们会对性能产生很大的影响,无论是对CPU的额外绘图调用,还是GPU上的额外处理。有关详细信息,请参阅轻型性能页面。
GPU: Tips for writing high-performance shaders
Different platforms have vastly different performance capabilities; a high-end PC GPU can handle much more in terms of graphics and shaders than a low-end mobile GPU. The same is true even on a single platform; a fast GPU is dozens of times faster than a slow integrated GPU.
GPU:不同平台编写高性能着色器的技巧有很大不同的性能;高端PC GPU比低端移动GPU能处理更多的图形和着色器。即使在一个平台上也是如此;一个快速GPU比一个慢集成GPU快几十倍。
GPU performance on mobile platforms and low-end PCs is likely to be much lower than on your development machine. It’s recommended that you manually optimize your shaders to reduce calculations and texture reads, in order to get good performance across low-end GPU machines. For example, some built-in Unity shaders have “mobile” equivalents that are much faster, but have some limitations or approximations.
GPU在移动平台和低端PC上的性能可能要比在您的开发机器上低得多。建议您手动优化您的着色器,以减少计算和纹理读取,以获得良好的性能在低端GPU机器。例如,一些内置的United着色器具有“移动”等价物,其速度要快得多,但也有一些限制或近似
Below are some guidelines for mobile and low-end PC graphics cards:
以下是移动和低端PC显卡的一些指南:
Complex mathematical operations
Transcendental mathematical functions (such as pow, exp, log, cos, sin, tan) are quite resource-intensive, so avoid using them where possible. Consider using lookup textures as an alternative to complex math calculations if applicable.
Avoid writing your own operations (such as normalize, dot, inversesqrt). Unity’s built-in options ensure that the driver can generate much better code. Remember that the Alpha Test (discard) operation often makes your fragment shader slower.
Floating point precision
While the precision (float vs half vs fixed) of floating point variables is largely ignored on desktop GPUs, it is quite important to get a good performance on mobile GPUs. See the Shader Data Types and Precision page for details.
For further details about shader performance, see the Shader Performance page.
复杂的数学运算-超越数学函数(如pow、exp、log、cos、sin、tan)-资源密集,因此尽可能避免使用它们。如果适用的话,可以考虑使用查找纹理作为复杂数学计算的替代方法。避免编写自己的操作(如规范化、点、逆)。统一的内置选项确保驱动程序能够生成更好的代码。记住,Alpha测试(丢弃)操作经常会使片段着色器变慢。在桌面GPU中,浮点变量的精度(浮点对半变量和固定变量)在很大程度上被忽略,因此在移动GPU上获得良好的性能是非常重要的。有关详细信息,请参阅Shader数据类型和精度页面。有关着色器性能的详细信息,请参阅“着色性能”页面。
Simple checklist to make your game faster
Keep the vertex count below 200K and 3M per frame when building for PC (depending on the target GPU).
If you’re using built-in shaders, pick ones from the Mobile or Unlit categories. They work on non-mobile platforms as well, but are simplified and approximated versions of the more complex shaders.
Keep the number of different materials per scene low, and share as many materials between different objects as possible.
简单的清单,使您的游戏更快,保持顶点计数低于200 K和3M每帧建设时,为个人电脑(取决于目标GPU)。如果您使用内置着色器,从移动或未亮的类别中选择一个。它们也在非移动平台上工作,但它们是更复杂的着色器的简化和近似版本。保持每个场景的不同材料的数量低,并在不同的对象之间共享尽可能多的材料。
Set the Static property on a non-moving object to allow internal optimizations like static batching.
Only have a single (preferably directional) pixel light affecting your geometry, rather than multiples.
Bake lighting rather than using dynamic lighting.
Use compressed texture formats when possible, and use 16-bit textures over 32-bit textures.
Avoid using fog where possible.
Use Occlusion Culling to reduce the amount of visible geometry and draw-calls in cases of complex static scenes with lots of occlusion. Design your levels with occlusion culling in mind.
Use skyboxes to “fake” distant geometry.
Use pixel shaders or texture combiners to mix several textures instead of a multi-pass approach.
Use half precision variables where possible.
Minimize use of complex mathematical operations such as pow, sin and cos in pixel shaders.
Use fewer textures per fragment.
在非移动对象上设置静态属性,以允许内部优化,如静态批处理。只有一个(最好是定向的)像素光影响你的几何学,而不是倍数。烘焙照明,而不是使用动态照明。在可能的情况下使用压缩的纹理格式,并在32位纹理上使用16位纹理.尽量避免使用雾。使用遮挡剔除来减少可视几何的数量,并在复杂的静态场景和大量遮挡的情况下绘制调用。设计你的水平与遮挡剔除在脑海中。使用天窗来“伪造”遥远的几何图形。使用像素着色器或纹理组合器混合多个纹理,而不是多通道方法。尽可能使用一半精度的变量。尽量减少复杂的数学操作,如POW,SIN和cos在像素着色器中的使用。每个片段使用较少的纹理。
以上是Unity 官方给出的优化图形性能的文档
当打开Deep Profile时,所有的脚本代码都被定义了—也就是说,所有函数调用都被记录下来。这是有用的知道确切的时间花费在你的游戏代码。
PS:深度剖析会产生非常大的开销,占用大量内存,因此,在进行分析时,游戏运行速度会大大减慢。如果代码过于复杂,可能根本无法进行深度剖析。对于基本代码游戏,深度剖析应该足够快。如果发现对整个游戏的深度剖析会导致帧速率下降,以至于游戏几乎不能运行,那么就应该考虑不使用此方法。当游戏过大(大型游戏),深度剖析可能会导致Unity耗尽内存,所以,这种深度剖析的方式也是不可能的,这时我们就要使用其他方式、如:手动分析脚本代码块的开销比使用深度分析的开销小。使用Profiler.BeginSample和Profiler.EndSample脚本函数来启用和禁用代码段的分析
文章来源: czhenya.blog.csdn.net,作者:陈言必行,版权归原作者所有,如需转载,请联系作者。
原文链接:czhenya.blog.csdn.net/article/details/84961553
- 点赞
- 收藏
- 关注作者
评论(0)