128位向量ALU数据置换指令子集的RTL设计
发布时间:2018-01-28 05:28
本文关键词: SIMD 向量转置单元 PowerPC 出处:《西安电子科技大学》2016年硕士论文 论文类型:学位论文
【摘要】:为了追求更高的数据处理能力,处理器操作数的精度在不断提高,而实际上很多运算操作并不需要处理如此高精度的数据,这样的操作方式没有充分发挥数据处理资源的全部潜力,造成资源的浪费,因而提出了基于单指令多数据流(SIMD)的子字并行优化策略。然而SIMD寄存器是以字对齐方式为访问形式的存储单元,实际存储器以字节对齐进行访存,因此在并行操作前后需要一个额外的数据转置单元(Permute)对数据进行转置处理,得到符合SIMD寄存器规格的操作数。为实现单指令多数据流SIMD并行处理机制中的数据转置功能,本文研究了基于Power PC的体系架构与AltiVec机制(Power PC体系架构通用处理器的多媒体指令集的扩展)的向量ALU数据转置单元,通过对Power PC体系架构与该处理器指令集的分析,设计实现了SIMD向量并行数据处理技术中负责数据处理的向量转置(Permute)模块。本设计模块处理128位的操作数,采用两级流水线,设计实现了Power PC_ISA指令集数据转置模块中包括向量打包指令、向量解包指令、向量合并指令、向量复制拼接指令、向量选择指令、向量置换指令、向量移位指令和向量聚集指令共八类53条向量转置指令的操作,该转置模块解决了SIMD向量寄存器与实际数据访存操作之间因数据格式不一致而导致并行操作技术对性能优化程度不高的问题。设计电路使用Verilog硬件描述语言完成电路描述,实现电路功能,最后对完成的设计电路进行功能仿真,保证其功能的正确性。本设计在第二级的流水线中引入了交叉开关矩阵(Crossbar Switch)结构完成了数据处理后的数据选择操作,该交叉开关矩阵模块完成了转置模块所有指令的数据选择操作,无阻塞的内部结构保证数据传输速度,根据分析指令之间的操作相似之处,实现指令间的模块共用,精简了电路结构,消除冗余电路,优化模块面积,提高处理速度。
[Abstract]:In order to achieve higher data processing power, the processor Operand accuracy is constantly improving, but in fact, many operations do not need to deal with such high precision data. This mode of operation does not give full play to the full potential of data processing resources, resulting in a waste of resources. Therefore, a sub-word parallel optimization strategy based on single-instruction multi-data stream (SIMD) is proposed. However, the SIMD register is a memory cell which is accessed by word alignment. The actual memory accesses memory in byte alignment, so an additional data transpose unit (Permute) is required before and after parallel operations. In order to realize the data transpose function in the SIMD parallel processing mechanism of single instruction and multiple data flow, the operands that accord with the SIMD register specification are obtained. This paper studies the extension of multimedia instruction set based on Power PC architecture and AltiVec mechanism. The vector ALU data transpose unit of. By analyzing the Power PC architecture and the instruction set of the processor. The design and implementation of the SIMD vector parallel data processing technology of the vector transpose data processing module, this design module processing 128-bit operands, using two-stage pipeline. The data transpose module of Power PC_ISA instruction set includes vector package instruction, vector unpack instruction, vector merge instruction, vector copy splicing instruction and vector selection instruction. Vector permutation instruction, vector shift instruction and vector aggregation instruction are eight kinds of 53 vector transpose instructions. The transposing module solves the problem that the data format is not consistent between the SIMD vector register and the actual data access operation, which leads to the low performance optimization of the parallel operation technology. The circuit uses Verilo to design the circuit. G hardware description language completes the circuit description. The function of the circuit is realized. Finally, the functional simulation of the designed circuit is carried out. To ensure the correctness of its function. This design introduces the cross switch matrix crossbar switch structure in the second stage of pipeline to complete the data selection operation after data processing. The cross-switch matrix module completes the data selection operation of all instructions of the transpose module, and the non-blocking internal structure guarantees the data transmission speed. It realizes the module sharing among instructions, simplifies the circuit structure, eliminates the redundant circuit, optimizes the module area and improves the processing speed.
【学位授予单位】:西安电子科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP332
【参考文献】
相关期刊论文 前3条
1 邓豹;刘照青;;基于AltiVec技术的PowerPC处理器矢量运算性能测试[J];计算机测量与控制;2015年06期
2 席筱颖;;集成电路功能验证方法[J];科技传播;2010年23期
3 解咏梅,张珩,张福新;基于覆盖率的功能验证方法[J];计算机应用研究;2005年01期
,本文编号:1469898
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1469898.html