网站模板后台怎么做,大英县住房和城乡建设局网站,绩效管理网站开发,郑州网页网站制作端午假期安装好了vs c2022,并写了个简单的汇编代码#xff0c;证明MASM真的可以运行。今天需要搞一个实实在在的C和ASM混合编程的例子#xff0c;因为用纯汇编的求伯君写WPS的时代一去不复返了。个别关键函数用汇编#xff0c;充分发挥CPU的特色功能#xff0c;偶尔还是需要…端午假期安装好了vs c2022,并写了个简单的汇编代码证明MASM真的可以运行。今天需要搞一个实实在在的C和ASM混合编程的例子因为用纯汇编的求伯君写WPS的时代一去不复返了。个别关键函数用汇编充分发挥CPU的特色功能偶尔还是需要的。
昨天找的随书代码的位置在github上GitHub - Apress/modern-x86-assembly-language-programming-3e: Source Code for Modern X86 Assembly Language Programming by Daniel Kusswurm
这是第三版最新的书。又从z-liabrary上下载了这本英文书导入微信读书自动翻译为中文z-libary加微信读书使我实现了读书ziyou啥时候财务ziyou还远。
这本书的附录A就举了怎样在vs2022环境建立一个C加ASM的例子今天咱们就逐步跟着书上学这个例子。
首先创建project
• Create a C project• Enable MASM support• Add an assembly language file• Set project properties•Edit the source code• Build and run the project
启动VSNew ProjectSelect Console AppProject name:Example1Solution name:TestSolutionCreateBuildConfiguration Manager,choose Edit...select X86, Remove--我的环境是Win32 其次配置ASM环境的步骤
ViewSolution Explorerrigtht-click Example1 and select Build DenpendenciesBuild Customizationscheck masmAdd New Itemselect .cpp for the file styleExample1_fasm.asm Add 第三步是设置project属性
Example1 and select PropertiesAll Configurations All PlatformsC/CCode Generation Set to Advanced Vector Extentions(/arch:AVX) or AVX2 or AVX512C/COutput change to Files Assembly Machine and Source Code(/FAcs)Microsoft Macro AssemblerListing File Enable Assembly Generated Code Listing to Yes(/Sg)Change the Assembled Code Listing File text filed to $(IntDir)\%(filename).lstClick OK $(IntDir)\%(filename).lst --这是1还是L
最后一步就是写源码了
AppendixA\TestSolution\Example1\Example1.cppAppendixA\TestSolution\Example1\Example1_fasm.asm
Example1.cpp
#include iostream
#include iomanip
#include string
#include cmathextern C void CalcZ_avx(float* z, const float* x, const float* y, size_t n);static void CalcZ_cpp(float* z, const float* x, const float* y, size_t n)
{for (size_t i 0; i n; i)z[i] x[i] y[i];
}int main(void)
{constexpr size_t n 20;float x[n], y[n], z1[n], z2[n];// Initialize the data arraysfor (size_t i 0; i n; i){x[i] i * 10.0f 10.0f;y[i] i * 1000.0f 1000.0f;z1[i] z2[i] 0.0f;}// Exercise the calculating functionsCalcZ_cpp(z1, x, y, n);CalcZ_avx(z2, x, y, n);// Display the resultsconstexpr char nl \n;constexpr size_t w 10;constexpr float eps 1.0e-6f;std::cout std::fixed std::setprecision(1);std::cout std::setw(w) i;std::cout std::setw(w) x;std::cout std::setw(w) y;std::cout std::setw(w) z1;std::cout std::setw(w) z2 nl;std::cout std::string(50, -) nl;for (size_t i 0; i n; i){std::cout std::setw(w) i;std::cout std::setw(w) x[i];std::cout std::setw(w) y[i];std::cout std::setw(w) z1[i];std::cout std::setw(w) z2[i] nl;if (fabs(z1[i] - z2[i]) eps){std::cout Compare error!\n;break;}}}
Example1_fasm.asm
;------------------------------------------------------------------------------
; Example1_fasm.asm
;------------------------------------------------------------------------------;------------------------------------------------------------------------------
; void CalcZ_avx(float* z, const float* x, const float* x, size_t n);
;------------------------------------------------------------------------------NSE equ 8 ;num_simd_elements
SF equ 4 ;scale factor for F32.code
CalcZ_avx proc; Validate argumentstest r9,r9 ;n 0?jz Done ;jump if yes; Initializemov rax,-SF ;rax array offset (Loop2)cmp r9,NSE ;n NSE?jb Loop2 ;jump if yesmov rax,-NSE*SF ;rax array offset (Loop1); Calculate z[i:i7] x[i:i7] y[i:i7]
Loop1: add rax,NSE*SF ;update array offsetvmovups ymm0,ymmword ptr [rdxrax] ;ymm0 x[i:i7]vmovups ymm1,ymmword ptr [r8rax] ;ymm1 y[i:i7]vaddps ymm2,ymm0,ymm1 ;z[i:i7] x[i:i7] y[i:i7]vmovups ymmword ptr [rcxrax],ymm2 ;save z[i:i7]sub r9,NSE ;n - NSEcmp r9,NSE ;n NSE?jae Loop1 ;jump if yestest r9,r9 ;n 0?jz Done ;jump if yesadd rax,NSE*SF-SF ;adjust array offset for Loop2; Calculate z[i] x[i] y[i] for remaining elements
Loop2: add rax,SF ;update array offsetvmovss xmm0,real4 ptr [rdxrax] ;xmm0 x[i]vmovss xmm1,real4 ptr [r8rax] ;xmm1 y[i]vaddss xmm2,xmm0,xmm1 ;z[i] x[i] y[i]vmovss real4 ptr [rcxrax],xmm2 ;save z[i]sub r9,1 ;n - 1jnz Loop2 ;repeat until doneDone: vzeroupperret ;return to caller
CalcZ_avx endpend最终构建运行即可 代码有点高大上估计是用了AVX两个loop同时运行。慢慢看书了解含义吧还挺复杂的。
这个例子太高深了再举个简单的例子把数组倒序输出。