Luna::Simd::load_f4x4

float4x4 load_f4x4(f32 const *mem_addr)

Loads 16 packed single-precision (32-bit) floating-point elements from mem_addr to dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

FOR r := 0 to 3
       i := 128 * r
       dst[r].x := MEM[mem_addr + i : mem_addr + i + 31]
       dst[r].y := MEM[mem_addr + i + 32 : mem_addr + i + 63]
       dst[r].z := MEM[mem_addr + i + 64 : mem_addr + i + 95]
       dst[r].w := MEM[mem_addr + i + 96 : mem_addr + i + 127]
ENDFOR