Zhao Lin, Yeo Chai Kiat, Khan Arijit, Luo Robby, Jin Ling Peng
College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore.
Department of Computer Science, Aalborg University, 9220, Aalborg, Denmark.
Sci Rep. 2024 Oct 14;14(1):24036. doi: 10.1038/s41598-024-68974-8.
GPUs are increasingly playing vital roles in the modern technology industry. Improving the GPU performance involves optimizing its architectural design and fine-tuning its software code. However, to achieve this, engineers must investigate codes from as many GPU-related applications as possible to identify code portions that need fine-tuning. Moreover, this effort requires engineers to have good domain knowledge, and their work is made more arduous because the source codes of applications are normally confidential. To this end, we introduce ShaderAnalyzer, a solution leveraging graph mining and machine learning to analyze GPU-executed low-level machine codes and identify their fine-tuning opportunities. Our approach includes representing machine code with graph structure and subsequently identifying frequently occurring substructures within the codes. Optimizing the execution of these substructures can enhance the overall performance of the GPU. In addition, our model leverages these frequent patterns to further facilitate engineers' tasks by selecting representative patterns to predict and investigate low-efficiency ones. We conduct comprehensive experiments to evaluate the performance of our solution, and the results have been validated by our industry partners. ShaderAnalyzer is an end-to-end framework that helps engineers identify code segments with the highest potential for performance gains after fine-tuning and offers valuable insights for hardware architects in future products design.
图形处理器(GPU)在现代科技产业中发挥着越来越重要的作用。提高GPU性能需要优化其架构设计并微调其软件代码。然而,要做到这一点,工程师必须研究尽可能多的与GPU相关的应用程序代码,以识别需要微调的代码部分。此外,这项工作要求工程师具备良好的领域知识,而且由于应用程序的源代码通常是保密的,他们的工作变得更加艰巨。为此,我们引入了ShaderAnalyzer,这是一种利用图挖掘和机器学习来分析GPU执行的低级机器代码并识别其微调机会的解决方案。我们的方法包括用图结构表示机器代码,随后识别代码中频繁出现的子结构。优化这些子结构的执行可以提高GPU的整体性能。此外,我们的模型利用这些频繁模式,通过选择有代表性的模式来预测和研究低效率模式,进一步方便工程师的工作。我们进行了全面的实验来评估我们解决方案的性能,结果得到了我们行业合作伙伴的验证。ShaderAnalyzer是一个端到端的框架,它可以帮助工程师识别微调后性能提升潜力最大的代码段,并为未来产品设计中的硬件架构师提供有价值的见解。