Optimize and clarify that `inPtr` is not stripmined in xn resamplers
by moving the declaration of the pointer out of the loop. This
may or may not remove an additional store or give a hint to the compiler
to keep the pointer value handy. More importantly, it clarifies
that the input data in this resampler is specifically not being
stripmined through, instead being referenced as the offsets buffer
and output are being stripmined through.
Also experimentally optimize xn resampler by doing remainder
and conversion (to address offset) in vector calculations instead
of in for loop.
Finally optimize xn resampler functions, by moving the
wrapping of the index and its conversion to a raw
address offset is moved from a basic for loop to vector
computation.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement 32-bit floating-point complex number resampler using
RVV intrinsics using the same trick that worked with 16-bit
integer complex numbers. Namely, interpreting the two
32-bit components of the complex numbers as a single
64-bit number for the purposes of moving around.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement 16-bit integer complex number resampler
using RVV intrinsics by interpreting each complex number
(with two 16-bit components) as a single 32-bit integer
for the purposes of moving them around.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement 16-bit integer xn resampler with a slightly
modifded copy of 32f xn resampler logic, along with the
16i resamplerxnpupper to allow it to be tested.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement 32-bit floating-point xn resampler
utilizing RVV intrinsics. It essentially uses the generic
logic except for the actual transfer part. Instead, it
saves the indices to sample from in a temporary buffer
that it then uses in RVV to do a simple unordered
indexed load into a unit-stride store.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Optimize 32-bit floating-point complex number to 8-bit
integer complex number conversion using RVV by skipping
the need to segment load/store by simply converting the raw
numbers stored within the complex numbers.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Optimize 32-bit floating-point complex numbers to 16-bit
integer complex number conversion using RVV intrinsics,
by not segment storing/loading and instead converting
the raw numbers.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Optimize conversion from 16-bit integer complex numbers to
32-bit floating-point complex numbers using RVV intrinsics
by skipping any need to distinguish between real and imaginary
numbers and instead just loading a contiguous selectiong
of components.
Timing results:
Old - generic: 39736.8 ms, rvv: 10406.7 ms
New - generic: 40479.9 ms, rvv: 8245.38 ms
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Make 32-bit floating-point complex number to 8-bit integer
complex number conversion consistent with the generic
implementation. Specifically, made it so, after
being multiplied by `INT8_MAX`, but before narrowing,
saturate each number to 8 bits.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Make 32-bit floating point complex number to 16-bit integer
complex number conversion using RVV intrinsics consistent
with the generic implementation and documentation
by saturating the value within 16-bits before converting.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement converting a vector of 32-bit floating point
complex numbers into a vector of 8-bit integer complex numbers
using RVV C intrinsics in `volk_gnsssdr_32fc_convert_8i_rvv`.
Required a lot of debugging, with attempts ranging from
saturation to not doing a narrowing conversion to using
the specification-recommended "round-to-odd" conversion.
In end, the problem was that the generic
implementation, with no notice in the function documentation
at all, actually multiplies the floating-point number by `INT8_MAX`
before converting.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Since RVV 32fc to 8i is conversion is not working, try implementing
a smaller scale conversion of a vector of 32-bit floating-point
complex numbers to 16-bit integer complex numbers.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement converting a vector of 16-bit integer complex numbers
to 32-bit floating point complex numbers using RVV C intrinsics
in `volk_gnsssdr_16ic_convert_32fc_rvv`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement computing the dot products of multiple vectors
dotted with a common vector, all with elements of 16-bit complex
numbers (with both the real and imaginary part being 16 bits)
using RVV C intrinsics in `volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_rvv`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement computing the dot product of two vectors with
complex numbers (each part being 16 bits) using RVV C intrinsics
in `volk_gnsssdr_16ic_x2_dot_prod_16ic_rvv`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement multiplying two vectors of 16-bit complex numbers (where
each part, real and imaginary, is represented by 16 bits) by element
using RVV C intrinsics in `volk_gnsssdr_16ic_x2_multiply_16ic_rvv`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement computing the conjugate on each element in a complex
vector using RVV C intrinsics in
`volk_gnsssdr_16ic_conjugate_16ic_rvv`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement calculating the conjugate for every element in a
vector of complex numbers using RVV C inrinsics in
`volk_gnsssdr_8ic_conjugate_8ic_rvv`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement mulitplying two vectors of complex numbers together,
point by point, into a new vector, using RVV C intrinsics.
This is in `volk_gnsssdr_8ic_x2_multiply_8ic_rvv`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement calculating the magnitude squared of each element in
a complex vector using RVV C intrinsics in the function
`volk_gnsssdr_8ic_magnitude_squared_8i_rvv`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement computing the dot product of two complex vectors
utilizing RVV C intrinsics. Was surprisingly satisfying to
puzzle out, though there are no doubt typos and whatnot to
be caught.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Fix comments in RVV-utilizing functions that incorrectly
said that `__riscv_vsetvl_e8m8` actually set up any state. Setting
up the correct state in the `vtype` register is handled by
the compiler; instead, `__riscv_vsetvl_e8m8` simply returns
the number of elements to be processed with an AVL given as an
argument and the correctsponding intrinsic-set SEW and LMUL.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement `volk_gnsssdr_8i_accumulator_s8i_rvv` that accumulates
a vector of 1-byte numbers and stores it in `result`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Implement multiplying two byte vectors utilizing RVV
through C intrinisics. In other words, implement
`volk_gnsssdr_8u_x2_multiply_8u_rvv`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>
Specifically, implemented `volk_gnsssdr_8i_x2_add_8i_rvv`
through strip mining. Interestingly, strip mining is
built into RVV's design, making the code actually
very straightforward.
Had to fix after realizing that `char` != `int8_t`, as explained
by this stack overflow:
https://stackoverflow.com/questions/451375/what-does-it-mean-for-a-char-to-be-signed
Basically, `char` being signed or unsigned is actually
implementation-defined. As such, when the C intrinsics specify
that it is `int8_t`, not `char`, they expect `signed char`.
Signed-off-by: Marcus Alagar <mvala079@gmail.com>