Abstract:On the basis of conventional parallel scheme, we continue to further tap the potential of CPU computing and start with CPU instruction level optimization, then use vector arithmetic logic unit (VALU) and SSE instruction set to finish four floating-point data operations in an instruction cycle. The following conclusion are obtained:① For wave equation forward modeling based on finite-difference, SSE instruction set can get good acceleration and obtain second acceleration of CPU; ② The acceleration ratio of SSE can increase slowly with the increase of forward modeling data, because it operates 4 floating-point data in one cycle, the maximum acceleration ratio cannot be beyond 4; ③ The acceleration of SSE can increase the efficiency without other devices, so it has low cost of acceleration and wide application; ④ The implementation of three-level parallel on single machine can achieve the best efficiency, however the efficiency on multi-machines depends on network speed. Numerical simulation experiments show that the new parallel scheme has a substantial increase in the operation speed compared with the conventional parallel scheme.