mirror of
https://github.com/gnss-sdr/gnss-sdr
synced 2024-12-12 19:20:32 +00:00
Improving documentation
Adding Doxygen documentation to VOLK_GNSSSDR kernels
This commit is contained in:
parent
1f78183dd4
commit
d9c333c85f
@ -42,6 +42,6 @@ From now on, GNSS-SDR (and any other program of your own that makes use of VOLK_
|
||||
|
||||
___
|
||||
|
||||
VOLK_GNSSSDR was originally created by Andres Cecilia Luque in the framework of the [Summer Of Code In Space (SOCIS 2014)](http://sophia.estec.esa.int/socis2014/?q=about "SOCIS 2014 webpage") program organized by the European Space Agency, and evolved since then by other authors (see each file header for main authorship). This software is released under the GNU General Public License version 3, see the file COPYING.
|
||||
VOLK_GNSSSDR was originally created by Andres Cecilia Luque in the framework of the [Summer Of Code In Space (SOCIS 2014)](http://sophia.estec.esa.int/socis2014/?q=about "SOCIS 2014 webpage") program organized by the European Space Agency, and then evolved and maintained by Carles Fernandez-Prades and Javier Arribas. This software is released under the GNU General Public License version 3, see the file COPYING.
|
||||
|
||||
This project is managed by [Centre Tecnologic de Telecomunicacions de Catalunya](http://www.cttc.es "CTTC webpage").
|
||||
|
@ -13,39 +13,39 @@ if other users find them useful.
|
||||
|
||||
### Adding kernels
|
||||
|
||||
Adding kernels refers to introducing a new function to the VOLK API that is
|
||||
Adding kernels refers to introducing a new function to the VOLK_GNSSSDR API that is
|
||||
presumably a useful math function/operation. The first step is to create
|
||||
the file in volk/kernels/volk. Follow the naming scheme provided in the
|
||||
VOLK terms and techniques page. First create the generic protokernel.
|
||||
the file in volk_gnsssdr/kernels/volk_gnsssdr. Follow the naming scheme provided in the
|
||||
VOLK_GNSSSDR terms and techniques page. First create the generic protokernel.
|
||||
|
||||
The generic protokernel should be written in plain C using explicitly sized
|
||||
types from stdint.h or volk_complex.h when appropriate. volk_complex.h
|
||||
types from stdint.h or volk_gnsssdr_complex.h when appropriate. volk_gnsssdr_complex.h
|
||||
includes explicitly sized complex types for floats and ints. The name of
|
||||
the generic kernel should be volk_signature_from_file_generic. If multiple
|
||||
the generic kernel should be volk_gnsssdr_signature_from_file_generic. If multiple
|
||||
versions of the generic kernel exist then a description can be appended to
|
||||
generic_, but it is not required to use alignment flags in the generic
|
||||
protokernel name. It is required to surround the entire generic function
|
||||
with preprocessor ifdef fences on the symbol LV_HAVE_GENERIC.
|
||||
|
||||
Finally, add the kernel to the list of test cases in volk/lib/kernel_tests.h.
|
||||
Finally, add the kernel to the list of test cases in volk_gnsssdr/lib/kernel_tests.h.
|
||||
Many kernels should be able to use the default test parameters, but if yours
|
||||
requires a lower tolerance, specific vector length, or other test parameters
|
||||
just create a new instance of volk_test_params_t for your kernel.
|
||||
just create a new instance of volk_gnsssdr_test_params_t for your kernel.
|
||||
|
||||
### Adding protokernels
|
||||
|
||||
The primary purpose of VOLK is to have multiple implementations of an operation
|
||||
The primary purpose of VOLK_GNSSSDR is to have multiple implementations of an operation
|
||||
tuned for a specific CPU architecture. Ideally there is at least one
|
||||
protokernel of each kernel for every architecture that VOLK supports.
|
||||
The pattern for protokernel naming is volk_kernel_signature_architecture_nick.
|
||||
The architecture should be one of the supported VOLK architectures. The nick is
|
||||
protokernel of each kernel for every architecture that VOLK_GNSSSDR supports.
|
||||
The pattern for protokernel naming is volk_gnsssdr_kernel_signature_architecture_nick.
|
||||
The architecture should be one of the supported VOLK_GNSSSDR architectures. The nick is
|
||||
an optional name to distinguish between multiple implementations for a
|
||||
particular architecture.
|
||||
|
||||
Architecture specific protokernels can be written in one of three ways.
|
||||
The first approach should always be to use compiler intrinsic functions.
|
||||
The second and third approaches are using either in-line assembly or
|
||||
assembly with .S files. Both methods of writing assembly exist in VOLK and
|
||||
assembly with .S files. Both methods of writing assembly exist in VOLK_GNSSSDR and
|
||||
should yield equivalent performance; which method you might choose is a
|
||||
matter of opinion. Regardless of the actual method the public function should
|
||||
be declared in the kernel header surrounded by ifdef fences on the symbol that
|
||||
@ -54,7 +54,7 @@ fits the architecture implementation.
|
||||
#### Compiler Intrinsics
|
||||
|
||||
Compiler intrinsics should be treated as functions that map to a specific
|
||||
assembly instruction. Most VOLK kernels take the form of a loop that iterates
|
||||
assembly instruction. Most VOLK_GNSSSDR kernels take the form of a loop that iterates
|
||||
through a vector. Form a loop that iterates on a number of items that is natural
|
||||
for the architecture and then use compiler intrinsics to do the math for your
|
||||
operation or algorithm. Include the appropriate header inside the ifdef fences,
|
||||
@ -72,7 +72,7 @@ based on intrinsics.
|
||||
To write pure assembly protokernels, first declare the function name in the
|
||||
kernel header file the same way as any other protokernel, but include the extern
|
||||
keyword. Second, create a file (one for each protokernel) in
|
||||
volk/kernels/volk/asm/$arch. Disassemble another protokernel and copy the
|
||||
volk_gnsssdr/kernels/volk_gnsssdr/asm/$arch. Disassemble another protokernel and copy the
|
||||
disassembled code in to this file to bootstrap a working implementation. Often
|
||||
the disassembled code can be hand-tuned to improve performance.
|
||||
|
||||
|
@ -10,15 +10,15 @@
|
||||
\li \subpage volk_gnsssdr_16ic_x2_dot_prod_16ic
|
||||
\li \subpage volk_gnsssdr_16ic_x2_dot_prod_16ic_xn
|
||||
\li \subpage volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn
|
||||
\li \subpage volk_gnsssdr_8i_accumulator_s8i
|
||||
\li \subpage volk_gnsssdr_8i_index_max_16u
|
||||
\li \subpage volk_gnsssdr_8i_max_s8i
|
||||
\li \subpage volk_gnsssdr_8i_x2_add_8i
|
||||
\li \subpage volk_gnsssdr_8ic_conjugate_8ic
|
||||
\li \subpage volk_gnsssdr_8ic_magnitude_squared_8i
|
||||
\li \subpage volk_gnsssdr_8ic_x2_dot_prod_8ic
|
||||
\li \subpage volk_gnsssdr_8ic_x2_multiply_8ic
|
||||
\li \subpage volk_gnsssdr_8ic_s8ic_multiply_8ic
|
||||
\li \subpage volk_gnsssdr_8i_accumulator_s8i
|
||||
\li \subpage volk_gnsssdr_8i_index_max_16u
|
||||
\li \subpage volk_gnsssdr_8i_max_s8i
|
||||
\li \subpage volk_gnsssdr_8i_x2_add_8i
|
||||
\li \subpage volk_gnsssdr_64f_accumulator_64f
|
||||
|
||||
*/
|
||||
|
@ -1,6 +1,6 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_convert_32fc.h
|
||||
* \brief Volk protokernel: converts 16 bit integer complex complex values to 32 bits float complex values
|
||||
* \brief VOLK_GNSSSDR kernel: converts 16 bit integer complex complex values to 32 bits float complex values.
|
||||
* \authors <ul>
|
||||
* <li> Javier Arribas, 2015. jarribas(at)cttc.es
|
||||
* </ul>
|
||||
@ -30,6 +30,28 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_16ic_convert_32fc
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Converts a complex vector of 16-bits integer each component
|
||||
* into a complex vector of 32-bits float each component.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_16ic_convert_32fc(lv_32fc_t* outputVector, const lv_16sc_t* inputVector, unsigned int num_points)
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li inputVector: The complex 16-bit integer input data buffer.
|
||||
* \li num_points: The number of data values to be converted.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li outputVector: pointer to a vector holding the converted vector.
|
||||
*
|
||||
*/
|
||||
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_16ic_convert_32fc_H
|
||||
#define INCLUDED_volk_gnsssdr_16ic_convert_32fc_H
|
||||
@ -38,12 +60,6 @@
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 16-bits integer each component into a complex vector of 32-bits float each component.
|
||||
\param[out] outputVector The complex 32-bit float output data buffer
|
||||
\param[in] inputVector The complex 16-bit integer input data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_convert_32fc_generic(lv_32fc_t* outputVector, const lv_16sc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
for(unsigned int i = 0; i < num_points; i++)
|
||||
@ -53,15 +69,10 @@ static inline void volk_gnsssdr_16ic_convert_32fc_generic(lv_32fc_t* outputVecto
|
||||
}
|
||||
#endif /* LV_HAVE_GENERIC */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 16-bits integer each component into a complex vector of 32-bits float each component.
|
||||
\param[out] outputVector The complex 32-bit float output data buffer
|
||||
\param[in] inputVector The complex 16-bit integer input data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_convert_32fc_a_sse2(lv_32fc_t* outputVector, const lv_16sc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 2;
|
||||
@ -89,15 +100,10 @@ static inline void volk_gnsssdr_16ic_convert_32fc_a_sse2(lv_32fc_t* outputVector
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 16-bits integer each component into a complex vector of 32-bits float each component.
|
||||
\param[out] outputVector The complex 32-bit float output data buffer
|
||||
\param[in] inputVector The complex 16-bit integer input data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_convert_32fc_u_sse2(lv_32fc_t* outputVector, const lv_16sc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 2;
|
||||
@ -128,12 +134,6 @@ static inline void volk_gnsssdr_16ic_convert_32fc_u_sse2(lv_32fc_t* outputVector
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 16-bits integer each component into a complex vector of 32-bits float each component.
|
||||
\param[out] outputVector The complex 32-bit float output data buffer
|
||||
\param[in] inputVector The complex 16-bit integer input data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_convert_32fc_neon(lv_32fc_t* outputVector, const lv_16sc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 2;
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_resampler_16ic.h
|
||||
* \brief Volk protokernel: resample a 16 bits complex vector
|
||||
* \brief VOLK_GNSSSDR kernel: resamples a 16 bits complex vector.
|
||||
* \authors <ul>
|
||||
* <li> Javier Arribas, 2015. jarribas(at)cttc.es
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* VOLK_GNSSSDR kernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* and 8 bits the imaginary part) and accumulates them
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
@ -33,6 +33,30 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_16ic_resampler_16ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Resamples a complex vector (16-bit integer each component).
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_16ic_resampler_16ic(lv_16sc_t* result, const lv_16sc_t* local_code, float rem_code_phase_chips, float code_phase_step_chips, int code_length_chips, unsigned int num_output_samples)
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li local_code: One of the vectors to be multiplied.
|
||||
* \li rem_code_phase_chips: Remnant code phase [chips]
|
||||
* \li code_phase_step_chips: Phase increment per sample [chips/sample]
|
||||
* \li code_length_chips: Code length in chips.
|
||||
* \li num_points: The number of data values to be in the resampled vector.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li result: Pointer to the resampled vector.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_16ic_resampler_16ic_H
|
||||
#define INCLUDED_volk_gnsssdr_16ic_resampler_16ic_H
|
||||
|
||||
@ -48,15 +72,6 @@
|
||||
// return (r > 0.0) ? (r + 0.5) : (r - 0.5);
|
||||
//}
|
||||
|
||||
/*!
|
||||
\brief Resamples a complex vector (16-bit integer each component)
|
||||
\param[out] result The vector where the result will be stored
|
||||
\param[in] local_code One of the vectors to be multiplied
|
||||
\param[in] rem_code_phase_chips Remnant code phase [chips]
|
||||
\param[in] code_phase_step_chips Phase increment per sample [chips/sample]
|
||||
\param[in] code_length_chips Code length in chips
|
||||
\param[in] num_output_samples Number of samples to be processed
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_resampler_16ic_generic(lv_16sc_t* result, const lv_16sc_t* local_code, float rem_code_phase_chips, float code_phase_step_chips, int code_length_chips, unsigned int num_output_samples)
|
||||
{
|
||||
int local_code_chip_index;
|
||||
@ -77,15 +92,6 @@ static inline void volk_gnsssdr_16ic_resampler_16ic_generic(lv_16sc_t* result, c
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Resamples a complex vector (16-bit integer each component)
|
||||
\param[out] result The vector where the result will be stored
|
||||
\param[in] local_code One of the vectors to be multiplied
|
||||
\param[in] rem_code_phase_chips Remnant code phase [chips]
|
||||
\param[in] code_phase_step_chips Phase increment per sample [chips/sample]
|
||||
\param[in] code_length_chips Code length in chips
|
||||
\param[in] num_output_samples Number of samples to be processed
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_resampler_16ic_a_sse2(lv_16sc_t* result, const lv_16sc_t* local_code, float rem_code_phase_chips, float code_phase_step_chips, int code_length_chips, unsigned int num_output_samples)//, int* scratch_buffer, float* scratch_buffer_float)
|
||||
{
|
||||
_MM_SET_ROUNDING_MODE (_MM_ROUND_NEAREST);//_MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO
|
||||
@ -165,18 +171,10 @@ static inline void volk_gnsssdr_16ic_resampler_16ic_a_sse2(lv_16sc_t* result, co
|
||||
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Resamples a complex vector (16-bit integer each component)
|
||||
\param[out] result The vector where the result will be stored
|
||||
\param[in] local_code One of the vectors to be multiplied
|
||||
\param[in] rem_code_phase_chips Remnant code phase [chips]
|
||||
\param[in] code_phase_step_chips Phase increment per sample [chips/sample]
|
||||
\param[in] code_length_chips Code length in chips
|
||||
\param[in] num_output_samples Number of samples to be processed
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_resampler_16ic_u_sse2(lv_16sc_t* result, const lv_16sc_t* local_code, float rem_code_phase_chips, float code_phase_step_chips, int code_length_chips, unsigned int num_output_samples)//, int* scratch_buffer, float* scratch_buffer_float)
|
||||
{
|
||||
_MM_SET_ROUNDING_MODE (_MM_ROUND_NEAREST);//_MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO
|
||||
@ -255,18 +253,10 @@ static inline void volk_gnsssdr_16ic_resampler_16ic_u_sse2(lv_16sc_t* result, co
|
||||
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Resamples a complex vector (16-bit integer each component)
|
||||
\param[out] result The vector where the result will be stored
|
||||
\param[in] local_code One of the vectors to be multiplied
|
||||
\param[in] rem_code_phase_chips Remnant code phase [chips]
|
||||
\param[in] code_phase_step_chips Phase increment per sample [chips/sample]
|
||||
\param[in] code_length_chips Code length in chips
|
||||
\param[in] num_output_samples Number of samples to be processed
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_resampler_16ic_neon(lv_16sc_t* result, const lv_16sc_t* local_code, float rem_code_phase_chips, float code_phase_step_chips, int code_length_chips, unsigned int num_output_samples)//, int* scratch_buffer, float* scratch_buffer_float)
|
||||
{
|
||||
unsigned int number;
|
||||
@ -308,7 +298,6 @@ static inline void volk_gnsssdr_16ic_resampler_16ic_neon(lv_16sc_t* result, cons
|
||||
__attribute__((aligned(16))) float init_4constant_float[4] = { 4.0f, 4.0f, 4.0f, 4.0f };
|
||||
float32x4_t _4constant_float = vld1q_f32(init_4constant_float);
|
||||
|
||||
|
||||
for(number = 0; number < quarterPoints; number++)
|
||||
{
|
||||
_code_phase_out = vmulq_f32(_code_phase_step_chips, _4output_index); //compute the code phase point with the phase step
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_resamplerpuppet_16ic.h
|
||||
* \brief Volk puppet for the 16-bit complex vector resampler kernel
|
||||
* \brief VOLK_GNSSSDR puppet for the 16-bit complex vector resampler kernel.
|
||||
* \authors <ul>
|
||||
* <li> Carles Fernandez Prades 2016 cfernandez at cttc dot cat
|
||||
* </ul>
|
||||
*
|
||||
* Volk puppet for integrating the resampler into volk's test system
|
||||
* VOLK_GNSSSDR puppet for integrating the resampler into the test system
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_resamplerxnpuppet_16ic.h
|
||||
* \brief Volk puppet for the multiple 16-bit complex vector resampler kernel
|
||||
* \brief VOLK_GNSSSDR puppet for the multiple 16-bit complex vector resampler kernel.
|
||||
* \authors <ul>
|
||||
* <li> Carles Fernandez Prades 2016 cfernandez at cttc dot cat
|
||||
* </ul>
|
||||
*
|
||||
* Volk puppet for integrating the multiple resampler into volk's test system
|
||||
* VOLK_GNSSSDR puppet for integrating the multiple resampler into the test system
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_rotatorpuppet_16ic.h
|
||||
* \brief Volk puppet for the 16-bit complex rotator kernel
|
||||
* \brief VOLK_GNSSSDR puppet for the 16-bit complex rotator kernel.
|
||||
* \authors <ul>
|
||||
* <li> Carles Fernandez Prades 2016 cfernandez at cttc dot cat
|
||||
* </ul>
|
||||
*
|
||||
* Volk puppet for integrating the resampler into volk's test system
|
||||
* VOLK_GNSSSDR puppet for integrating the rotator into the test system
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_16ic_s32fc_x2_rotator_16ic.h
|
||||
* \brief Volk protokernel: rotates a 16 bits complex vector
|
||||
* \file volk_gnsssdr_16ic_s32fc_x2_rotator_16ic.h
|
||||
* \brief VOLK_GNSSSDR kernel: rotates a 16 bits complex vector.
|
||||
* \authors <ul>
|
||||
* <li> Carles Fernandez-Prades, 2015 cfernandez at cttc.es
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that rotates a 16-bit complex vector
|
||||
* VOLK_GNSSSDR kernel that rotates a 16-bit complex vector
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
@ -32,6 +32,29 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_16ic_s32fc_x2_rotator_16ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Rotates a complex vector (16-bit integer samples each component).
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_16ic_s32fc_x2_rotator_16ic(lv_16sc_t* outVector, const lv_16sc_t* inVector, const lv_32fc_t phase_inc, lv_32fc_t* phase, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li inVector: Vector to be rotated.
|
||||
* \li phase_inc: Phase increment in each sample = lv_cmake(cos(phase_step_rad), sin(phase_step_rad))
|
||||
* \li phase: Initial phase = lv_cmake(cos(initial_phase_rad), sin(initial_phase_rad))
|
||||
* \li num_points: Number of complex values to be rotated and stored into \p outVector
|
||||
*
|
||||
* \b Outputs
|
||||
* \li phase: Final phase.
|
||||
* \li outVector: The resampled vector.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_16ic_s32fc_x2_rotator_16ic_H
|
||||
#define INCLUDED_volk_gnsssdr_16ic_s32fc_x2_rotator_16ic_H
|
||||
@ -42,14 +65,6 @@
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
|
||||
/*!
|
||||
\brief Rotates a complex vector (16-bit integer samples each component)
|
||||
\param[out] outVector Rotated vector
|
||||
\param[in] inVector Vector to be rotated
|
||||
\param[in] phase_inc Phase increment = lv_cmake(cos(phase_step_rad), -sin(phase_step_rad))
|
||||
\param[in,out] phase Initial / final phase
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_s32fc_x2_rotator_16ic_generic(lv_16sc_t* outVector, const lv_16sc_t* inVector, const lv_32fc_t phase_inc, lv_32fc_t* phase, unsigned int num_points)
|
||||
{
|
||||
unsigned int i = 0;
|
||||
@ -70,14 +85,6 @@ static inline void volk_gnsssdr_16ic_s32fc_x2_rotator_16ic_generic(lv_16sc_t* ou
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Rotates a complex vector (16-bit integer samples each component)
|
||||
\param[out] outVector Rotated vector
|
||||
\param[in] inVector Vector to be rotated
|
||||
\param[in] phase_inc Phase increment = lv_cmake(cos(phase_step_rad), -sin(phase_step_rad))
|
||||
\param[in,out] phase Initial / final phase
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_s32fc_x2_rotator_16ic_a_sse3(lv_16sc_t* outVector, const lv_16sc_t* inVector, const lv_32fc_t phase_inc, lv_32fc_t* phase, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 4;
|
||||
@ -163,17 +170,10 @@ static inline void volk_gnsssdr_16ic_s32fc_x2_rotator_16ic_a_sse3(lv_16sc_t* out
|
||||
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Rotates a complex vector (16-bit integer samples each component)
|
||||
\param[out] outVector Rotated vector
|
||||
\param[in] inVector Vector to be rotated
|
||||
\param[in] phase_inc Phase increment = lv_cmake(cos(phase_step_rad), -sin(phase_step_rad))
|
||||
\param[in,out] phase Initial / final phase
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_s32fc_x2_rotator_16ic_u_sse3(lv_16sc_t* outVector, const lv_16sc_t* inVector, const lv_32fc_t phase_inc, lv_32fc_t* phase, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 4;
|
||||
@ -260,17 +260,10 @@ static inline void volk_gnsssdr_16ic_s32fc_x2_rotator_16ic_u_sse3(lv_16sc_t* out
|
||||
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Rotates a complex vector (16-bit integer samples each component)
|
||||
\param[out] outVector Rotated vector
|
||||
\param[in] inVector Vector to be rotated
|
||||
\param[in] phase_inc Phase increment = lv_cmake(cos(phase_step_rad), -sin(phase_step_rad))
|
||||
\param[in,out] phase Initial / final phase
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_s32fc_x2_rotator_16ic_neon(lv_16sc_t* outVector, const lv_16sc_t* inVector, const lv_32fc_t phase_inc, lv_32fc_t* phase, unsigned int num_points)
|
||||
{
|
||||
unsigned int i = 0;
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_x2_dot_prod_16ic.h
|
||||
* \brief Volk protokernel: multiplies two 16 bits vectors and accumulates them
|
||||
* \brief VOLK_GNSSSDR kernel: multiplies two 16 bits vectors and accumulates them.
|
||||
* \authors <ul>
|
||||
* <li> Javier Arribas, 2015. jarribas(at)cttc.es
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* VOLK_GNSSSDR kernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* and 8 bits the imaginary part) and accumulates them
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
@ -33,6 +33,29 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_16ic_x2_dot_prod_16ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Multiplies two input complex vectors (16-bit integer each component) and accumulates them,
|
||||
* storing the result. Results are saturated so never go beyond the limits of the data type.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_16ic_x2_dot_prod_16ic(lv_16sc_t* result, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li in_a: One of the vectors to be multiplied and accumulated.
|
||||
* \li in_b: The other vector to be multiplied and accumulated.
|
||||
* \li num_points: Number of complex values to be multiplied together, accumulated and stored into \p result
|
||||
*
|
||||
* \b Outputs
|
||||
* \li result: Value of the accumulated result.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_16ic_x2_dot_prod_16ic_H
|
||||
#define INCLUDED_volk_gnsssdr_16ic_x2_dot_prod_16ic_H
|
||||
|
||||
@ -43,13 +66,6 @@
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors (16-bit integer each component) and accumulates them, storing the result. Results are saturated so never go beyond the limits of the data type.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] in_a One of the vectors to be multiplied and accumulated
|
||||
\param[in] in_b One of the vectors to be multiplied and accumulated
|
||||
\param[in] num_points The number of complex values in aVector and bVector to be multiplied together, accumulated and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_generic(lv_16sc_t* result, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
result[0] = lv_cmake((int16_t)0, (int16_t)0);
|
||||
@ -66,13 +82,6 @@ static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_generic(lv_16sc_t* result,
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors (16-bit integer each component) and accumulates them, storing the result. Results are saturated so never go beyond the limits of the data type.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] in_a One of the vectors to be multiplied and accumulated
|
||||
\param[in] in_b One of the vectors to be multiplied and accumulated
|
||||
\param[in] num_points The number of complex values in aVector and bVector to be multiplied together, accumulated and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_a_sse2(lv_16sc_t* out, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t dotProduct = lv_cmake((int16_t)0, (int16_t)0);
|
||||
@ -149,13 +158,6 @@ static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_a_sse2(lv_16sc_t* out, con
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors (16-bit integer each component) and accumulates them, storing the result. Results are saturated so never go beyond the limits of the data type.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] in_a One of the vectors to be multiplied and accumulated
|
||||
\param[in] in_b One of the vectors to be multiplied and accumulated
|
||||
\param[in] num_points The number of complex values in aVector and bVector to be multiplied together, accumulated and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_u_sse2(lv_16sc_t* out, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t dotProduct = lv_cmake((int16_t)0, (int16_t)0);
|
||||
@ -232,13 +234,6 @@ static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_u_sse2(lv_16sc_t* out, con
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors (16-bit integer each component) and accumulates them, storing the result. Results are saturated so never go beyond the limits of the data type.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] in_a One of the vectors to be multiplied and accumulated
|
||||
\param[in] in_b One of the vectors to be multiplied and accumulated
|
||||
\param[in] num_points The number of complex values in aVector and bVector to be multiplied together, accumulated and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_neon(lv_16sc_t* out, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
unsigned int quarter_points = num_points / 4;
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_x2_dot_prod_16ic_xn.h
|
||||
* \brief Volk protokernel: multiplies N 16 bits vectors by a common vector and accumulates the results in N 16 bits short complex outputs.
|
||||
* \brief VOLK_GNSSSDR kernel: multiplies N 16 bits vectors by a common vector and accumulates the results in N 16 bits short complex outputs.
|
||||
* \authors <ul>
|
||||
* <li> Javier Arribas, 2015. jarribas(at)cttc.es
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that multiplies N 16 bits vectors by a common vector and accumulates the results in N 16 bits short complex outputs.
|
||||
* VOLK_GNSSSDR kernel that multiplies N 16 bits vectors by a common vector and accumulates the results in N 16 bits short complex outputs.
|
||||
* It is optimized to perform the N tap correlation process in GNSS receivers.
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
@ -33,6 +33,30 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_16ic_x2_dot_prod_16ic_xn
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Multiplies a reference complex vector by an arbitrary number of other complex vectors, accumulates the results and stores them in the output vector.
|
||||
* This function can be used as a multiple correlator.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_16ic_x2_dot_prod_16ic_xn(lv_16sc_t* result, const lv_16sc_t* in_common, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li in_common: Pointer to one of the vectors to be multiplied and accumulated (reference vector)
|
||||
* \li in_a: Pointer to an array of pointers to other vectors to be multiplied by \p in_common and accumulated.
|
||||
* \li num_a_vectors: Number of vectors to be multiplied by the reference vector \p in_common and accumulated.
|
||||
* \li num_points: Number of complex values to be multiplied together, accumulated and stored into \p result
|
||||
*
|
||||
* \b Outputs
|
||||
* \li result: Vector of \p num_a_vectors components with vector \p in_common multiplied by the vectors in \p in_a and accumulated.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_16ic_xn_dot_prod_16ic_xn_H
|
||||
#define INCLUDED_volk_gnsssdr_16ic_xn_dot_prod_16ic_xn_H
|
||||
|
||||
@ -41,14 +65,7 @@
|
||||
#include <volk_gnsssdr/saturation_arithmetic.h>
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Multiplies the reference complex vector with multiple versions of another complex vector, accumulates the results and stores them in the output vector
|
||||
\param[out] result Array of num_a_vectors components with the multiple versions of in_a multiplied and accumulated The vector where the accumulated result will be stored
|
||||
\param[in] in_common Pointer to one of the vectors to be multiplied and accumulated (reference vector)
|
||||
\param[in] in_a Pointer to an array of pointers to multiple versions of the other vector to be multiplied and accumulated
|
||||
\param[in] num_a_vectors Number of vectors to be multiplied by the reference vector and accumulated
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_generic(lv_16sc_t* result, const lv_16sc_t* in_common, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points)
|
||||
{
|
||||
for (int n_vec = 0; n_vec < num_a_vectors; n_vec++)
|
||||
@ -70,14 +87,6 @@ static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_generic(lv_16sc_t* resu
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the reference complex vector with multiple versions of another complex vector, accumulates the results and stores them in the output vector
|
||||
\param[out] result Array of num_a_vectors components with the multiple versions of in_a multiplied and accumulated The vector where the accumulated result will be stored
|
||||
\param[in] in_common Pointer to one of the vectors to be multiplied and accumulated (reference vector)
|
||||
\param[in] in_a Pointer to an array of pointers to multiple versions of the other vector to be multiplied and accumulated
|
||||
\param[in] num_a_vectors Number of vectors to be multiplied by the reference vector and accumulated
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_a_sse2(lv_16sc_t* result, const lv_16sc_t* in_common, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t dotProduct = lv_cmake(0,0);
|
||||
@ -160,22 +169,15 @@ static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_a_sse2(lv_16sc_t* resul
|
||||
|
||||
_out[n_vec] = lv_cmake(sat_adds16i(lv_creal(_out[n_vec]), lv_creal(tmp)),
|
||||
sat_adds16i(lv_cimag(_out[n_vec]), lv_cimag(tmp)));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the reference complex vector with multiple versions of another complex vector, accumulates the results and stores them in the output vector
|
||||
\param[out] result Array of num_a_vectors components with the multiple versions of in_a multiplied and accumulated The vector where the accumulated result will be stored
|
||||
\param[in] in_common Pointer to one of the vectors to be multiplied and accumulated (reference vector)
|
||||
\param[in] in_a Pointer to an array of pointers to multiple versions of the other vector to be multiplied and accumulated
|
||||
\param[in] num_a_vectors Number of vectors to be multiplied by the reference vector and accumulated
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_u_sse2(lv_16sc_t* result, const lv_16sc_t* in_common, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t dotProduct = lv_cmake(0,0);
|
||||
@ -258,22 +260,15 @@ static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_u_sse2(lv_16sc_t* resul
|
||||
|
||||
_out[n_vec] = lv_cmake(sat_adds16i(lv_creal(_out[n_vec]), lv_creal(tmp)),
|
||||
sat_adds16i(lv_cimag(_out[n_vec]), lv_cimag(tmp)));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the reference complex vector with multiple versions of another complex vector, accumulates the results and stores them in the output vector
|
||||
\param[out] result Array of num_a_vectors components with the multiple versions of in_a multiplied and accumulated The vector where the accumulated result will be stored
|
||||
\param[in] in_common Pointer to one of the vectors to be multiplied and accumulated (reference vector)
|
||||
\param[in] in_a Pointer to an array of pointers to multiple versions of the other vector to be multiplied and accumulated
|
||||
\param[in] num_a_vectors Number of vectors to be multiplied by the reference vector and accumulated
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_neon(lv_16sc_t* result, const lv_16sc_t* in_common, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t dotProduct = lv_cmake(0,0);
|
||||
@ -354,9 +349,8 @@ static inline void volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_neon(lv_16sc_t* result,
|
||||
|
||||
_out[n_vec] = lv_cmake(sat_adds16i(lv_creal(_out[n_vec]), lv_creal(tmp)),
|
||||
sat_adds16i(lv_cimag(_out[n_vec]), lv_cimag(tmp)));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
#endif /* LV_HAVE_NEON */
|
||||
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_x2_dotprodxnpuppet_16ic.h
|
||||
* \brief Volk puppet for the multiple 16-bit complex dot product kernel
|
||||
* \brief VOLK_GNSSSDR puppet for the multiple 16-bit complex dot product kernel.
|
||||
* \authors <ul>
|
||||
* <li> Carles Fernandez Prades 2016 cfernandez at cttc dot cat
|
||||
* </ul>
|
||||
*
|
||||
* Volk puppet for integrating the resampler into volk's test system
|
||||
* VOLK_GNSSSDR puppet for integrating the resampler into the test system
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_x2_dot_prod_16ic.h
|
||||
* \brief Volk protokernel: multiplies two 16 bits vectors and accumulates them
|
||||
* \file volk_gnsssdr_16ic_x2_multiply_16ic.h
|
||||
* \brief VOLK_GNSSSDR kernel: multiplies two 16 bits vectors and accumulates them.
|
||||
* \authors <ul>
|
||||
* <li> Javier Arribas, 2015. jarribas(at)cttc.es
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* VOLK_GNSSSDR kernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* and 8 bits the imaginary part) and accumulates them
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
@ -33,6 +33,28 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_16ic_x2_multiply_16ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Multiplies two input complex vectors, point-by-point, storing the result in the third vector.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_16ic_x2_multiply_16ic(lv_16sc_t* result, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li in_a: One of the vectors to be multiplied.
|
||||
* \li in_b: The other vector to be multiplied.
|
||||
* \li num_points: The number of complex data points.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li result: The vector where the result will be stored.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_16ic_x2_multiply_16ic_H
|
||||
#define INCLUDED_volk_gnsssdr_16ic_x2_multiply_16ic_H
|
||||
|
||||
@ -40,13 +62,7 @@
|
||||
#include <volk_gnsssdr/volk_gnsssdr_complex.h>
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors, point-by-point, storing the result in the third vector
|
||||
\param[out] result The vector where the result will be stored
|
||||
\param[in] in_a One of the vectors to be multiplied
|
||||
\param[in] in_b One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_16ic_x2_multiply_16ic_generic(lv_16sc_t* result, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
for (unsigned int n = 0; n < num_points; n++)
|
||||
@ -62,13 +78,6 @@ static inline void volk_gnsssdr_16ic_x2_multiply_16ic_generic(lv_16sc_t* result,
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors, point-by-point, storing the result in the third vector
|
||||
\param[out] result The vector where the result will be stored
|
||||
\param[in] in_a One of the vectors to be multiplied
|
||||
\param[in] in_b One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_multiply_16ic_a_sse2(lv_16sc_t* out, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 4;
|
||||
@ -118,16 +127,10 @@ static inline void volk_gnsssdr_16ic_x2_multiply_16ic_a_sse2(lv_16sc_t* out, con
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors, point-by-point, storing the result in the third vector
|
||||
\param[out] result The vector where the result will be stored
|
||||
\param[in] in_a One of the vectors to be multiplied
|
||||
\param[in] in_b One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_multiply_16ic_u_sse2(lv_16sc_t* out, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 4;
|
||||
@ -177,16 +180,10 @@ static inline void volk_gnsssdr_16ic_x2_multiply_16ic_u_sse2(lv_16sc_t* out, con
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors, point-by-point, storing the result in the third vector
|
||||
\param[out] result The vector where the result will be stored
|
||||
\param[in] in_a One of the vectors to be multiplied
|
||||
\param[in] in_b One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_multiply_16ic_neon(lv_16sc_t* out, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t *a_ptr = (lv_16sc_t*) in_a;
|
||||
|
@ -1,12 +1,12 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_x2_dot_prod_16ic_xn.h
|
||||
* \brief Volk protokernel: multiplies N 16 bits vectors by a common vector
|
||||
* \file volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn.h
|
||||
* \brief VOLK_GNSSSDR kernel: multiplies N 16 bits vectors by a common vector
|
||||
* phase rotated and accumulates the results in N 16 bits short complex outputs.
|
||||
* \authors <ul>
|
||||
* <li> Javier Arribas, 2015. jarribas(at)cttc.es
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that multiplies N 16 bits vectors by a common vector, which is
|
||||
* VOLK_GNSSSDR kernel that multiplies N 16 bits vectors by a common vector, which is
|
||||
* phase-rotated by phase offset and phase increment, and accumulates the results
|
||||
* in N 16 bits short complex outputs.
|
||||
* It is optimized to perform the N tap correlation process in GNSS receivers.
|
||||
@ -36,8 +36,36 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_16ic_xn_rotator_dot_prod_16ic_xn_H
|
||||
#define INCLUDED_volk_gnsssdr_16ic_xn_rotator_dot_prod_16ic_xn_H
|
||||
/*!
|
||||
* \page volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Rotates and multiplies the reference complex vector with an arbitrary number of other complex vectors,
|
||||
* accumulates the results and stores them in the output vector.
|
||||
* This function can be used for Doppler wipe-off and multiple correlator.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn((lv_16sc_t* result, const lv_16sc_t* in_common, const lv_32fc_t phase_inc, lv_32fc_t* phase, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li in_common: Pointer to one of the vectors to be rotated, multiplied and accumulated (reference vector).
|
||||
* \li phase_inc: Phase increment = lv_cmake(cos(phase_step_rad), sin(phase_step_rad))
|
||||
* \li phase: Initial phase = lv_cmake(cos(initial_phase_rad), sin(initial_phase_rad))
|
||||
* \li in_a: Pointer to an array of pointers to multiple vectors to be multiplied and accumulated.
|
||||
* \li num_a_vectors: Number of vectors to be multiplied by the reference vector and accumulated.
|
||||
* \li num_points: Number of complex values to be multiplied together, accumulated and stored into \p result.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li phase: Final phase.
|
||||
* \li result: Vector of \p num_a_vectors components with the multiple vectors of \p in_a rotated, multiplied by \p in_common and accumulated.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_H
|
||||
#define INCLUDED_volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_H
|
||||
|
||||
|
||||
#include <volk_gnsssdr/volk_gnsssdr_complex.h>
|
||||
@ -46,16 +74,7 @@
|
||||
//#include <stdio.h>
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Rotates and multiplies the reference complex vector with multiple versions of another complex vector, accumulates the results and stores them in the output vector
|
||||
\param[out] result Array of num_a_vectors components with the multiple versions of in_a multiplied and accumulated The vector where the accumulated result will be stored
|
||||
\param[in] in_common Pointer to one of the vectors to be rotated, multiplied and accumulated (reference vector)
|
||||
\param[in] phase_inc Phase increment = lv_cmake(cos(phase_step_rad), -sin(phase_step_rad))
|
||||
\param[in,out] phase Initial / final phase
|
||||
\param[in] in_a Pointer to an array of pointers to multiple versions of the other vector to be multiplied and accumulated
|
||||
\param[in] num_a_vectors Number of vectors to be multiplied by the reference vector and accumulated
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_generic(lv_16sc_t* result, const lv_16sc_t* in_common, const lv_32fc_t phase_inc, lv_32fc_t* phase, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t tmp16;
|
||||
@ -85,16 +104,6 @@ static inline void volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_generic(lv_16sc
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Rotates and multiplies the reference complex vector with multiple versions of another complex vector, accumulates the results and stores them in the output vector
|
||||
\param[out] result Array of num_a_vectors components with the multiple versions of in_a multiplied and accumulated The vector where the accumulated result will be stored
|
||||
\param[in] in_common Pointer to one of the vectors to be rotated, multiplied and accumulated (reference vector)
|
||||
\param[in] phase_inc Phase increment = lv_cmake(cos(phase_step_rad), -sin(phase_step_rad))
|
||||
\param[in,out] phase Initial / final phase
|
||||
\param[in] in_a Pointer to an array of pointers to multiple versions of the other vector to be multiplied and accumulated
|
||||
\param[in] num_a_vectors Number of vectors to be multiplied by the reference vector and accumulated
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_a_sse3(lv_16sc_t* result, const lv_16sc_t* in_common, const lv_32fc_t phase_inc, lv_32fc_t* phase, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t dotProduct = lv_cmake(0,0);
|
||||
@ -247,19 +256,10 @@ static inline void volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_a_sse3(lv_16sc_
|
||||
}
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Rotates and multiplies the reference complex vector with multiple versions of another complex vector, accumulates the results and stores them in the output vector
|
||||
\param[out] result Array of num_a_vectors components with the multiple versions of in_a multiplied and accumulated The vector where the accumulated result will be stored
|
||||
\param[in] in_common Pointer to one of the vectors to be rotated, multiplied and accumulated (reference vector)
|
||||
\param[in] phase_inc Phase increment = lv_cmake(cos(phase_step_rad), -sin(phase_step_rad))
|
||||
\param[in,out] phase Initial / final phase
|
||||
\param[in] in_a Pointer to an array of pointers to multiple versions of the other vector to be multiplied and accumulated
|
||||
\param[in] num_a_vectors Number of vectors to be multiplied by the reference vector and accumulated
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_u_sse3(lv_16sc_t* result, const lv_16sc_t* in_common, const lv_32fc_t phase_inc, lv_32fc_t* phase, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t dotProduct = lv_cmake(0,0);
|
||||
@ -414,16 +414,6 @@ static inline void volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_u_sse3(lv_16sc_
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Rotates and multiplies the reference complex vector with multiple versions of another complex vector, accumulates the results and stores them in the output vector
|
||||
\param[out] result Array of num_a_vectors components with the multiple versions of in_a multiplied and accumulated The vector where the accumulated result will be stored
|
||||
\param[in] in_common Pointer to one of the vectors to be rotated, multiplied and accumulated (reference vector)
|
||||
\param[in] phase_inc Phase increment = lv_cmake(cos(phase_step_rad), -sin(phase_step_rad))
|
||||
\param[in,out] phase Initial / final phase
|
||||
\param[in] in_a Pointer to an array of pointers to multiple versions of the other vector to be multiplied and accumulated
|
||||
\param[in] num_a_vectors Number of vectors to be multiplied by the reference vector and accumulated
|
||||
\param[in] num_points The Number of complex values to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_neon(lv_16sc_t* result, const lv_16sc_t* in_common, const lv_32fc_t phase_inc, lv_32fc_t* phase, const lv_16sc_t** in_a, int num_a_vectors, unsigned int num_points)
|
||||
{
|
||||
const unsigned int neon_iters = num_points / 4;
|
||||
@ -584,4 +574,4 @@ static inline void volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn_neon(lv_16sc_t*
|
||||
|
||||
#endif /* LV_HAVE_NEON */
|
||||
|
||||
#endif /*INCLUDED_volk_gnsssdr_16ic_xn_dot_prod_16ic_xn_H*/
|
||||
#endif /*INCLUDED_volk_gnsssdr_16ic_x2_dot_prod_16ic_xn_H*/
|
||||
|
@ -1,6 +1,6 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_x2_dotprodxnpuppet_16ic.h
|
||||
* \brief Volk puppet for the multiple 16-bit complex dot product kernel
|
||||
* \file volk_gnsssdr_16ic_x2_rotator_dotprodxnpuppet_16ic.h
|
||||
* \brief Volk puppet for the multiple 16-bit complex dot product kernel.
|
||||
* \authors <ul>
|
||||
* <li> Carles Fernandez Prades 2016 cfernandez at cttc dot cat
|
||||
* </ul>
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_16ic_xn_resampler_16ic_xn.h
|
||||
* \brief Volk protokernel: Resamples N 16 bits integer short complex vectors using zero hold resample algorithm.
|
||||
* \brief VOLK_GNSSSDR kernel: Resamples N 16 bits integer short complex vectors using zero hold resample algorithm.
|
||||
* \authors <ul>
|
||||
* <li> Javier Arribas, 2015. jarribas(at)cttc.es
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that esamples N 16 bits integer short complex vectors using zero hold resample algorithm.
|
||||
* VOLK_GNSSSDR kernel that esamples N 16 bits integer short complex vectors using zero hold resample algorithm.
|
||||
* It is optimized to resample a sigle GNSS local code signal replica into N vectors fractional-resampled and fractional-delayed
|
||||
* (i.e. it creates the Early, Prompt, and Late code replicas)
|
||||
*
|
||||
@ -34,6 +34,31 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_16ic_xn_resampler_16ic_xn
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Resamples a complex vector (16-bit integer each component), providing \p num_out_vectors outputs.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_16ic_xn_resampler_16ic_xn(lv_16sc_t** result, const lv_16sc_t* local_code, float* rem_code_phase_chips, float code_phase_step_chips, unsigned int code_length_chips, int num_out_vectors, unsigned int num_output_samples)
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li local_code: One of the vectors to be multiplied.
|
||||
* \li rem_code_phase_chips: Remnant code phase [chips].
|
||||
* \li code_phase_step_chips: Phase increment per sample [chips/sample].
|
||||
* \li code_length_chips: Code length in chips.
|
||||
* \li num_out_vectors Number of output vectors.
|
||||
* \li num_output_samples: The number of data values to be in the resampled vector.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li result: Pointer to a vector of pointers where the results will be stored.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_16ic_xn_resampler_16ic_xn_H
|
||||
#define INCLUDED_volk_gnsssdr_16ic_xn_resampler_16ic_xn_H
|
||||
|
||||
@ -49,16 +74,7 @@
|
||||
// return (r > 0.0) ? (r + 0.5) : (r - 0.5);
|
||||
//}
|
||||
|
||||
/*!
|
||||
\brief Resamples a complex vector (16-bit integer each component), providing num_out_vectors outputs
|
||||
\param[out] result Pointer to the vector where the results will be stored
|
||||
\param[in] local_code One of the vectors to be multiplied
|
||||
\param[in] rem_code_phase_chips Pointer to the vector containing the remnant code phase for each output [chips]
|
||||
\param[in] code_phase_step_chips Phase increment per sample [chips/sample]
|
||||
\param[in] code_length_chips Code length in chips
|
||||
\param[in] num_out_vectors Number of output vectors
|
||||
\param[in] num_output_samples Number of samples to be processed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_generic(lv_16sc_t** result, const lv_16sc_t* local_code, float* rem_code_phase_chips, float code_phase_step_chips, unsigned int code_length_chips, int num_out_vectors, unsigned int num_output_samples)
|
||||
{
|
||||
int local_code_chip_index;
|
||||
@ -84,16 +100,6 @@ static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_generic(lv_16sc_t** re
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Resamples a complex vector (16-bit integer each component), providing num_out_vectors outputs
|
||||
\param[out] result Pointer to the vector where the results will be stored
|
||||
\param[in] local_code One of the vectors to be multiplied
|
||||
\param[in] rem_code_phase_chips Pointer to the vector containing the remnant code phase for each output [chips]
|
||||
\param[in] code_phase_step_chips Phase increment per sample [chips/sample]
|
||||
\param[in] code_length_chips Code length in chips
|
||||
\param[in] num_out_vectors Number of output vectors
|
||||
\param[in] num_output_samples Number of samples to be processed
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_a_sse2(lv_16sc_t** result, const lv_16sc_t* local_code, float* rem_code_phase_chips ,float code_phase_step_chips, unsigned int code_length_chips, int num_out_vectors, unsigned int num_output_samples)
|
||||
{
|
||||
_MM_SET_ROUNDING_MODE (_MM_ROUND_NEAREST);//_MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO
|
||||
@ -125,7 +131,7 @@ static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_a_sse2(lv_16sc_t** res
|
||||
|
||||
__m128i negative_indexes, overflow_indexes,_code_phase_out_int, _code_phase_out_int_neg,_code_phase_out_int_over;
|
||||
|
||||
__m128i zero=_mm_setzero_si128();
|
||||
__m128i zero = _mm_setzero_si128();
|
||||
|
||||
__attribute__((aligned(16))) float init_idx_float[4] = { 0.0f, 1.0f, 2.0f, 3.0f };
|
||||
__m128 _4output_index = _mm_load_ps(init_idx_float);
|
||||
@ -148,11 +154,11 @@ static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_a_sse2(lv_16sc_t** res
|
||||
_code_phase_out_with_offset = _mm_add_ps(_code_phase_out, _rem_code_phase); //add the phase offset
|
||||
_code_phase_out_int = _mm_cvtps_epi32(_code_phase_out_with_offset); //convert to integer
|
||||
|
||||
negative_indexes = _mm_cmplt_epi32 (_code_phase_out_int, zero); //test for negative values
|
||||
negative_indexes = _mm_cmplt_epi32(_code_phase_out_int, zero); //test for negative values
|
||||
_code_phase_out_int_neg = _mm_add_epi32(_code_phase_out_int, _code_length_chips); //the negative values branch
|
||||
_code_phase_out_int_neg = _mm_xor_si128(_code_phase_out_int, _mm_and_si128( negative_indexes,_mm_xor_si128( _code_phase_out_int_neg, _code_phase_out_int )));
|
||||
_code_phase_out_int_neg = _mm_xor_si128(_code_phase_out_int, _mm_and_si128( negative_indexes, _mm_xor_si128( _code_phase_out_int_neg, _code_phase_out_int )));
|
||||
|
||||
overflow_indexes = _mm_cmpgt_epi32 (_code_phase_out_int_neg, _code_length_chips_minus1); //test for overflow values
|
||||
overflow_indexes = _mm_cmpgt_epi32(_code_phase_out_int_neg, _code_length_chips_minus1); //test for overflow values
|
||||
_code_phase_out_int_over = _mm_sub_epi32(_code_phase_out_int_neg, _code_length_chips); //the negative values branch
|
||||
_code_phase_out_int_over = _mm_xor_si128(_code_phase_out_int_neg, _mm_and_si128( overflow_indexes, _mm_xor_si128( _code_phase_out_int_over, _code_phase_out_int_neg )));
|
||||
|
||||
@ -183,19 +189,10 @@ static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_a_sse2(lv_16sc_t** res
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Resamples a complex vector (16-bit integer each component), providing num_out_vectors outputs
|
||||
\param[out] result Pointer to the vector where the results will be stored
|
||||
\param[in] local_code One of the vectors to be multiplied
|
||||
\param[in] rem_code_phase_chips Pointer to the vector containing the remnant code phase for each output [chips]
|
||||
\param[in] code_phase_step_chips Phase increment per sample [chips/sample]
|
||||
\param[in] code_length_chips Code length in chips
|
||||
\param[in] num_out_vectors Number of output vectors
|
||||
\param[in] num_output_samples Number of samples to be processed
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_u_sse2(lv_16sc_t** result, const lv_16sc_t* local_code, float* rem_code_phase_chips ,float code_phase_step_chips, unsigned int code_length_chips, int num_out_vectors, unsigned int num_output_samples)
|
||||
{
|
||||
_MM_SET_ROUNDING_MODE (_MM_ROUND_NEAREST);//_MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO
|
||||
@ -227,7 +224,7 @@ static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_u_sse2(lv_16sc_t** res
|
||||
|
||||
__m128i negative_indexes, overflow_indexes,_code_phase_out_int, _code_phase_out_int_neg,_code_phase_out_int_over;
|
||||
|
||||
__m128i zero=_mm_setzero_si128();
|
||||
__m128i zero = _mm_setzero_si128();
|
||||
|
||||
__attribute__((aligned(16))) float init_idx_float[4] = { 0.0f, 1.0f, 2.0f, 3.0f };
|
||||
__m128 _4output_index = _mm_loadu_ps(init_idx_float);
|
||||
@ -250,11 +247,11 @@ static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_u_sse2(lv_16sc_t** res
|
||||
_code_phase_out_with_offset = _mm_add_ps(_code_phase_out, _rem_code_phase); //add the phase offset
|
||||
_code_phase_out_int = _mm_cvtps_epi32(_code_phase_out_with_offset); //convert to integer
|
||||
|
||||
negative_indexes = _mm_cmplt_epi32 (_code_phase_out_int, zero); //test for negative values
|
||||
negative_indexes = _mm_cmplt_epi32(_code_phase_out_int, zero); //test for negative values
|
||||
_code_phase_out_int_neg = _mm_add_epi32(_code_phase_out_int, _code_length_chips); //the negative values branch
|
||||
_code_phase_out_int_neg = _mm_xor_si128(_code_phase_out_int, _mm_and_si128( negative_indexes,_mm_xor_si128( _code_phase_out_int_neg, _code_phase_out_int )));
|
||||
_code_phase_out_int_neg = _mm_xor_si128(_code_phase_out_int, _mm_and_si128( negative_indexes, _mm_xor_si128( _code_phase_out_int_neg, _code_phase_out_int )));
|
||||
|
||||
overflow_indexes = _mm_cmpgt_epi32 (_code_phase_out_int_neg, _code_length_chips_minus1); //test for overflow values
|
||||
overflow_indexes = _mm_cmpgt_epi32(_code_phase_out_int_neg, _code_length_chips_minus1); //test for overflow values
|
||||
_code_phase_out_int_over = _mm_sub_epi32(_code_phase_out_int_neg, _code_length_chips); //the negative values branch
|
||||
_code_phase_out_int_over = _mm_xor_si128(_code_phase_out_int_neg, _mm_and_si128( overflow_indexes, _mm_xor_si128( _code_phase_out_int_over, _code_phase_out_int_neg )));
|
||||
|
||||
@ -286,19 +283,10 @@ static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_u_sse2(lv_16sc_t** res
|
||||
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Resamples a complex vector (16-bit integer each component), providing num_out_vectors outputs
|
||||
\param[out] result Pointer to the vector where the results will be stored
|
||||
\param[in] local_code One of the vectors to be multiplied
|
||||
\param[in] rem_code_phase_chips Pointer to the vector containing the remnant code phase for each output [chips]
|
||||
\param[in] code_phase_step_chips Phase increment per sample [chips/sample]
|
||||
\param[in] code_length_chips Code length in chips
|
||||
\param[in] num_out_vectors Number of output vectors
|
||||
\param[in] num_output_samples Number of samples to be processed
|
||||
*/
|
||||
static inline void volk_gnsssdr_16ic_xn_resampler_16ic_xn_neon(lv_16sc_t** result, const lv_16sc_t* local_code, float* rem_code_phase_chips ,float code_phase_step_chips, unsigned int code_length_chips, int num_out_vectors, unsigned int num_output_samples)
|
||||
{
|
||||
unsigned int number;
|
||||
|
@ -1,6 +1,6 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_32fc_convert_16ic.h
|
||||
* \brief Volk protokernel: converts float32 complex values to 16 integer complex values taking care of overflow
|
||||
* \brief VOLK_GNSSSDR kernel: converts float32 complex values to 16 integer complex values taking care of overflow.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
@ -30,6 +30,29 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_32fc_convert_16ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Converts a complex vector of 32-bits float each component into
|
||||
* a complex vector of 16-bits integer each component.
|
||||
* Values are saturated to the limit values of the output data type.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_32fc_convert_16ic(lv_32fc_t* outputVector, const lv_16sc_t* inputVector, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li inputVector: The complex 32-bit float input data buffer.
|
||||
* \li num_points: The number of data values to be converted.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li outputVector: The complex 16-bit integer output data buffer.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_32fc_convert_16ic_H
|
||||
#define INCLUDED_volk_gnsssdr_32fc_convert_16ic_H
|
||||
|
||||
@ -40,12 +63,6 @@
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 16-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 16-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_32fc_convert_16ic_u_sse2(lv_16sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 4;
|
||||
@ -91,15 +108,10 @@ static inline void volk_gnsssdr_32fc_convert_16ic_u_sse2(lv_16sc_t* outputVector
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
#ifdef LV_HAVE_SSE
|
||||
#include <xmmintrin.h> // __m64, __m128 ??
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 16-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 16-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
#ifdef LV_HAVE_SSE
|
||||
#include <xmmintrin.h>
|
||||
|
||||
static inline void volk_gnsssdr_32fc_convert_16ic_u_sse(lv_16sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 4;
|
||||
@ -149,12 +161,6 @@ static inline void volk_gnsssdr_32fc_convert_16ic_u_sse(lv_16sc_t* outputVector,
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 16-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 16-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_32fc_convert_16ic_a_sse2(lv_16sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 4;
|
||||
@ -200,15 +206,10 @@ static inline void volk_gnsssdr_32fc_convert_16ic_a_sse2(lv_16sc_t* outputVector
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE
|
||||
#include <xmmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 16-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 16-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_32fc_convert_16ic_a_sse(lv_16sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points/4;
|
||||
@ -254,15 +255,10 @@ static inline void volk_gnsssdr_32fc_convert_16ic_a_sse(lv_16sc_t* outputVector,
|
||||
}
|
||||
#endif /* LV_HAVE_SSE */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 16-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 16-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_32fc_convert_16ic_neon(lv_16sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int neon_iters = num_points / 4;
|
||||
@ -318,14 +314,9 @@ static inline void volk_gnsssdr_32fc_convert_16ic_neon(lv_16sc_t* outputVector,
|
||||
|
||||
#endif /* LV_HAVE_NEON */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 16-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 16-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_32fc_convert_16ic_generic(lv_16sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
float* inputVectorPtr = (float*)inputVector;
|
||||
|
@ -1,6 +1,6 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_32fc_convert_8ic.h
|
||||
* \brief Volk protokernel: converts float32 complex values to 8 integer complex values taking care of overflow
|
||||
* \brief VOLK_GNSSSDR kernel: converts float32 complex values to 8 integer complex values taking care of overflow.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
@ -30,6 +30,29 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_32fc_convert_8ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Converts a complex vector of 32-bits float each component into
|
||||
* a complex vector of 8-bits integer each component.
|
||||
* Values are saturated to the limit values of the output data type.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_32fc_convert_8ic(lv_8sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li inputVector: The complex 32-bit float input data buffer.
|
||||
* \li num_points: The number of data values to be converted.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li outputVector: The complex 8-bit integer output data buffer.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_32fc_convert_8ic_H
|
||||
#define INCLUDED_volk_gnsssdr_32fc_convert_8ic_H
|
||||
|
||||
@ -42,12 +65,6 @@
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 8-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 8-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_32fc_convert_8ic_u_sse2(lv_8sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
unsigned i = 0;
|
||||
@ -103,14 +120,9 @@ static inline void volk_gnsssdr_32fc_convert_8ic_u_sse2(lv_8sc_t* outputVector,
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 8-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 8-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_32fc_convert_8ic_generic(lv_8sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
float* inputVectorPtr = (float*)inputVector;
|
||||
@ -133,12 +145,6 @@ static inline void volk_gnsssdr_32fc_convert_8ic_generic(lv_8sc_t* outputVector,
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 8-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 8-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_32fc_convert_8ic_a_sse2(lv_8sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -197,12 +203,6 @@ static inline void volk_gnsssdr_32fc_convert_8ic_a_sse2(lv_8sc_t* outputVector,
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Converts a complex vector of 32-bits float each component into a complex vector of 8-bits integer each component. Values are saturated to the limit values of the output data type.
|
||||
\param[out] outputVector The complex 8-bit integer output data buffer
|
||||
\param[in] inputVector The complex 32-bit float data buffer
|
||||
\param[in] num_points The number of data values to be converted
|
||||
*/
|
||||
static inline void volk_gnsssdr_32fc_convert_8ic_neon(lv_8sc_t* outputVector, const lv_32fc_t* inputVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int neon_iters = num_points / 8;
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_64f_accumulator_64f.h
|
||||
* \brief Volk protokernel: 64 bits (double) scalar accumulator
|
||||
* \brief VOLK_GNSSSDR kernel: 64 bits (double) scalar accumulator.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that implements an accumulator of char values
|
||||
* VOLK_GNSSSDR kernel that implements an accumulator of double (64-bit float) values.
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
@ -32,6 +32,27 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_64f_accumulator_64f
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Accumulates the values in the input buffer.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_64f_accumulator_64f(double* result, const double* inputBuffer, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li inputBuffer: The buffer of data to be accumulated.
|
||||
* \li num_points: The number of values in \p inputBuffer to be accumulated.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li result: The accumulated result.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_64f_accumulator_64f_u_H
|
||||
#define INCLUDED_volk_gnsssdr_64f_accumulator_64f_u_H
|
||||
|
||||
@ -39,96 +60,91 @@
|
||||
|
||||
#ifdef LV_HAVE_AVX
|
||||
#include <immintrin.h>
|
||||
/*!
|
||||
\brief Accumulates the values in the input buffer
|
||||
\param result The accumulated result
|
||||
\param inputBuffer The buffer of data to be accumulated
|
||||
\param num_points The number of values in inputBuffer to be accumulated
|
||||
*/
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_u_avx(double* result,const double* inputBuffer, unsigned int num_points){
|
||||
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_u_avx(double* result, const double* inputBuffer, unsigned int num_points)
|
||||
{
|
||||
double returnValue = 0;
|
||||
const unsigned int sse_iters = num_points / 4;
|
||||
|
||||
|
||||
const double* aPtr = inputBuffer;
|
||||
|
||||
|
||||
__VOLK_ATTR_ALIGNED(32) double tempBuffer[4];
|
||||
__m256d accumulator = _mm256_setzero_pd();
|
||||
__m256d aVal = _mm256_setzero_pd();
|
||||
|
||||
|
||||
for(unsigned int number = 0; number < sse_iters; number++)
|
||||
{
|
||||
aVal = _mm256_loadu_pd(aPtr);
|
||||
accumulator = _mm256_add_pd(accumulator, aVal);
|
||||
aPtr += 4;
|
||||
}
|
||||
|
||||
_mm256_storeu_pd((double*)tempBuffer,accumulator);
|
||||
|
||||
for(unsigned int i = 0; i<4; ++i){
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i<(num_points % 4); ++i){
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
|
||||
{
|
||||
aVal = _mm256_loadu_pd(aPtr);
|
||||
accumulator = _mm256_add_pd(accumulator, aVal);
|
||||
aPtr += 4;
|
||||
}
|
||||
|
||||
_mm256_storeu_pd((double*)tempBuffer, accumulator);
|
||||
|
||||
for(unsigned int i = 0; i < 4; ++i)
|
||||
{
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i < (num_points % 4); ++i)
|
||||
{
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
|
||||
*result = returnValue;
|
||||
}
|
||||
#endif /* LV_HAVE_AVX */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
/*!
|
||||
\brief Accumulates the values in the input buffer
|
||||
\param result The accumulated result
|
||||
\param inputBuffer The buffer of data to be accumulated
|
||||
\param num_points The number of values in inputBuffer to be accumulated
|
||||
*/
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_u_sse3(double* result,const double* inputBuffer, unsigned int num_points){
|
||||
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_u_sse3(double* result,const double* inputBuffer, unsigned int num_points)
|
||||
{
|
||||
double returnValue = 0;
|
||||
const unsigned int sse_iters = num_points / 2;
|
||||
|
||||
|
||||
const double* aPtr = inputBuffer;
|
||||
|
||||
|
||||
__VOLK_ATTR_ALIGNED(16) double tempBuffer[2];
|
||||
__m128d accumulator = _mm_setzero_pd();
|
||||
__m128d aVal = _mm_setzero_pd();
|
||||
|
||||
|
||||
for(unsigned int number = 0; number < sse_iters; number++)
|
||||
{
|
||||
aVal = _mm_loadu_pd(aPtr);
|
||||
accumulator = _mm_add_pd(accumulator, aVal);
|
||||
aPtr += 2;
|
||||
}
|
||||
|
||||
_mm_storeu_pd((double*)tempBuffer,accumulator);
|
||||
|
||||
for(unsigned int i = 0; i<2; ++i){
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i<(num_points % 2); ++i){
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
|
||||
{
|
||||
aVal = _mm_loadu_pd(aPtr);
|
||||
accumulator = _mm_add_pd(accumulator, aVal);
|
||||
aPtr += 2;
|
||||
}
|
||||
|
||||
_mm_storeu_pd((double*)tempBuffer, accumulator);
|
||||
|
||||
for(unsigned int i = 0; i < 2; ++i)
|
||||
{
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i < (num_points % 2); ++i)
|
||||
{
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
|
||||
*result = returnValue;
|
||||
}
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Accumulates the values in the input buffer
|
||||
\param result The accumulated result
|
||||
\param inputBuffer The buffer of data to be accumulated
|
||||
\param num_points The number of values in inputBuffer to be accumulated
|
||||
*/
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_generic(double* result,const double* inputBuffer, unsigned int num_points){
|
||||
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_generic(double* result,const double* inputBuffer, unsigned int num_points)
|
||||
{
|
||||
const double* aPtr = inputBuffer;
|
||||
double returnValue = 0;
|
||||
|
||||
for(unsigned int number = 0;number < num_points; number++){
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
|
||||
for(unsigned int number = 0;number < num_points; number++)
|
||||
{
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
*result = returnValue;
|
||||
}
|
||||
#endif /* LV_HAVE_GENERIC */
|
||||
@ -136,78 +152,75 @@ static inline void volk_gnsssdr_64f_accumulator_64f_generic(double* result,const
|
||||
|
||||
#ifdef LV_HAVE_AVX
|
||||
#include <immintrin.h>
|
||||
/*!
|
||||
\brief Accumulates the values in the input buffer
|
||||
\param result The accumulated result
|
||||
\param inputBuffer The buffer of data to be accumulated
|
||||
\param num_points The number of values in inputBuffer to be accumulated
|
||||
*/
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_a_avx(double* result,const double* inputBuffer, unsigned int num_points){
|
||||
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_a_avx(double* result,const double* inputBuffer, unsigned int num_points)
|
||||
{
|
||||
double returnValue = 0;
|
||||
const unsigned int sse_iters = num_points / 4;
|
||||
|
||||
|
||||
const double* aPtr = inputBuffer;
|
||||
|
||||
|
||||
__VOLK_ATTR_ALIGNED(32) double tempBuffer[4];
|
||||
__m256d accumulator = _mm256_setzero_pd();
|
||||
__m256d aVal = _mm256_setzero_pd();
|
||||
|
||||
|
||||
for(unsigned int number = 0; number < sse_iters; number++)
|
||||
{
|
||||
aVal = _mm256_load_pd(aPtr);
|
||||
accumulator = _mm256_add_pd(accumulator, aVal);
|
||||
aPtr += 4;
|
||||
}
|
||||
|
||||
_mm256_store_pd((double*)tempBuffer,accumulator);
|
||||
|
||||
for(unsigned int i = 0; i<4; ++i){
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i<(num_points % 4); ++i){
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
|
||||
{
|
||||
aVal = _mm256_load_pd(aPtr);
|
||||
accumulator = _mm256_add_pd(accumulator, aVal);
|
||||
aPtr += 4;
|
||||
}
|
||||
|
||||
_mm256_store_pd((double*)tempBuffer, accumulator);
|
||||
|
||||
for(unsigned int i = 0; i < 4; ++i)
|
||||
{
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i < (num_points % 4); ++i)
|
||||
{
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
|
||||
*result = returnValue;
|
||||
}
|
||||
#endif /* LV_HAVE_AVX */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
/*!
|
||||
\brief Accumulates the values in the input buffer
|
||||
\param result The accumulated result
|
||||
\param inputBuffer The buffer of data to be accumulated
|
||||
\param num_points The number of values in inputBuffer to be accumulated
|
||||
*/
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_a_sse3(double* result,const double* inputBuffer, unsigned int num_points){
|
||||
|
||||
static inline void volk_gnsssdr_64f_accumulator_64f_a_sse3(double* result,const double* inputBuffer, unsigned int num_points)
|
||||
{
|
||||
double returnValue = 0;
|
||||
const unsigned int sse_iters = num_points / 2;
|
||||
|
||||
|
||||
const double* aPtr = inputBuffer;
|
||||
|
||||
|
||||
__VOLK_ATTR_ALIGNED(16) double tempBuffer[2];
|
||||
__m128d accumulator = _mm_setzero_pd();
|
||||
__m128d aVal = _mm_setzero_pd();
|
||||
|
||||
|
||||
for(unsigned int number = 0; number < sse_iters; number++)
|
||||
{
|
||||
aVal = _mm_load_pd(aPtr);
|
||||
accumulator = _mm_add_pd(accumulator, aVal);
|
||||
aPtr += 2;
|
||||
}
|
||||
|
||||
_mm_store_pd((double*)tempBuffer,accumulator);
|
||||
|
||||
for(unsigned int i = 0; i<2; ++i){
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i<(num_points % 2); ++i){
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
|
||||
{
|
||||
aVal = _mm_load_pd(aPtr);
|
||||
accumulator = _mm_add_pd(accumulator, aVal);
|
||||
aPtr += 2;
|
||||
}
|
||||
|
||||
_mm_store_pd((double*)tempBuffer, accumulator);
|
||||
|
||||
for(unsigned int i = 0; i < 2; ++i)
|
||||
{
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i < (num_points % 2); ++i)
|
||||
{
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
|
||||
*result = returnValue;
|
||||
}
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8i_accumulator_s8i.h
|
||||
* \brief Volk protokernel: 8 bits (char) scalar accumulator
|
||||
* \brief VOLK_GNSSSDR kernel: 8 bits (char) scalar accumulator.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that implements an accumulator of char values
|
||||
* VOLK_GNSSSDR kernel that implements an accumulator of char values
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
@ -32,6 +32,27 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8i_accumulator_s8i
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Accumulates the values in the input buffer.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_8i_accumulator_s8i(char* result, const char* inputBuffer, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li inputBuffer: The buffer of data to be accumulated.
|
||||
* \li num_points: The number of values in \p inputBuffer to be accumulated.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li result: The accumulated result.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8i_accumulator_s8i_H
|
||||
#define INCLUDED_volk_gnsssdr_8i_accumulator_s8i_H
|
||||
|
||||
@ -40,12 +61,7 @@
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
/*!
|
||||
\brief Accumulates the values in the input buffer
|
||||
\param result The accumulated result
|
||||
\param inputBuffer The buffer of data to be accumulated
|
||||
\param num_points The number of values in inputBuffer to be accumulated
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_accumulator_s8i_u_sse3(char* result, const char* inputBuffer, unsigned int num_points)
|
||||
{
|
||||
char returnValue = 0;
|
||||
@ -63,14 +79,14 @@ static inline void volk_gnsssdr_8i_accumulator_s8i_u_sse3(char* result, const ch
|
||||
accumulator = _mm_add_epi8(accumulator, aVal);
|
||||
aPtr += 16;
|
||||
}
|
||||
_mm_storeu_si128((__m128i*)tempBuffer,accumulator);
|
||||
_mm_storeu_si128((__m128i*)tempBuffer, accumulator);
|
||||
|
||||
for(unsigned int i = 0; i<16; ++i)
|
||||
for(unsigned int i = 0; i < 16; ++i)
|
||||
{
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i<(num_points % 16); ++i)
|
||||
for(unsigned int i = 0; i < (num_points % 16); ++i)
|
||||
{
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
@ -79,13 +95,9 @@ static inline void volk_gnsssdr_8i_accumulator_s8i_u_sse3(char* result, const ch
|
||||
}
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Accumulates the values in the input buffer
|
||||
\param result The accumulated result
|
||||
\param inputBuffer The buffer of data to be accumulated
|
||||
\param num_points The number of values in inputBuffer to be accumulated
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_accumulator_s8i_generic(char* result, const char* inputBuffer, unsigned int num_points)
|
||||
{
|
||||
const char* aPtr = inputBuffer;
|
||||
@ -99,14 +111,10 @@ static inline void volk_gnsssdr_8i_accumulator_s8i_generic(char* result, const c
|
||||
}
|
||||
#endif /* LV_HAVE_GENERIC */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
/*!
|
||||
\brief Accumulates the values in the input buffer
|
||||
\param result The accumulated result
|
||||
\param inputBuffer The buffer of data to be accumulated
|
||||
\param num_points The number of values in inputBuffer to be accumulated
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_accumulator_s8i_a_sse3(char* result, const char* inputBuffer, unsigned int num_points)
|
||||
{
|
||||
char returnValue = 0;
|
||||
@ -126,11 +134,12 @@ static inline void volk_gnsssdr_8i_accumulator_s8i_a_sse3(char* result, const ch
|
||||
}
|
||||
_mm_store_si128((__m128i*)tempBuffer,accumulator);
|
||||
|
||||
for(unsigned int i = 0; i<16; ++i){
|
||||
for(unsigned int i = 0; i < 16; ++i)
|
||||
{
|
||||
returnValue += tempBuffer[i];
|
||||
}
|
||||
}
|
||||
|
||||
for(unsigned int i = 0; i<(num_points % 16); ++i)
|
||||
for(unsigned int i = 0; i < (num_points % 16); ++i)
|
||||
{
|
||||
returnValue += (*aPtr++);
|
||||
}
|
||||
@ -139,13 +148,9 @@ static inline void volk_gnsssdr_8i_accumulator_s8i_a_sse3(char* result, const ch
|
||||
}
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_ORC
|
||||
/*!
|
||||
\brief Accumulates the values in the input buffer
|
||||
\param result The accumulated result
|
||||
\param inputBuffer The buffer of data to be accumulated
|
||||
\param num_points The number of values in inputBuffer to be accumulated
|
||||
*/
|
||||
|
||||
extern void volk_gnsssdr_8i_accumulator_s8i_a_orc_impl(short* result, const char* inputBuffer, unsigned int num_points);
|
||||
|
||||
static inline void volk_gnsssdr_8i_accumulator_s8i_u_orc(char* result, const char* inputBuffer, unsigned int num_points)
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8i_index_max_16u.h
|
||||
* \brief Volk protokernel: calculates the index of the maximum value in a group of 8 bits (char) scalars
|
||||
* \brief VOLK_GNSSSDR kernel: calculates the index of the maximum value in a group of 8 bits (char) scalars.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that returns the index of the maximum value of a group of 8 bits (char) scalars
|
||||
* VOLK_GNSSSDR kernel that returns the index of the maximum value of a group of 8 bits (char) scalars
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
@ -32,6 +32,27 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8i_index_max_16u
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Returns the index of the max value in \p src0
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_8i_index_max_16u(unsigned int* target, const char* src0, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li src0: The buffer of data to be analyzed.
|
||||
* \li num_points: The number of values in \p src0 to be analyzed.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li target: The index of the maximum value in \p src0
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8i_index_max_16u_H
|
||||
#define INCLUDED_volk_gnsssdr_8i_index_max_16u_H
|
||||
|
||||
@ -39,12 +60,7 @@
|
||||
|
||||
#ifdef LV_HAVE_AVX
|
||||
#include <immintrin.h>
|
||||
/*!
|
||||
\brief Returns the index of the max value in src0
|
||||
\param target The index of the max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_index_max_16u_u_avx(unsigned int* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -107,14 +123,10 @@ static inline void volk_gnsssdr_8i_index_max_16u_u_avx(unsigned int* target, con
|
||||
|
||||
#endif /*LV_HAVE_AVX*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE4_1
|
||||
#include <smmintrin.h>
|
||||
/*!
|
||||
\brief Returns the index of the max value in src0
|
||||
\param target The index of the max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_index_max_16u_u_sse4_1(unsigned int* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -168,14 +180,10 @@ static inline void volk_gnsssdr_8i_index_max_16u_u_sse4_1(unsigned int* target,
|
||||
|
||||
#endif /*LV_HAVE_SSE4_1*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include<emmintrin.h>
|
||||
/*!
|
||||
\brief Returns the index of the max value in src0
|
||||
\param target The index of the max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_index_max_16u_u_sse2(unsigned int* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -235,13 +243,9 @@ static inline void volk_gnsssdr_8i_index_max_16u_u_sse2(unsigned int* target, co
|
||||
|
||||
#endif /*LV_HAVE_SSE2*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Returns the index of the max value in src0
|
||||
\param target The index of the max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_index_max_16u_generic(unsigned int* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -266,12 +270,7 @@ static inline void volk_gnsssdr_8i_index_max_16u_generic(unsigned int* target, c
|
||||
|
||||
#ifdef LV_HAVE_AVX
|
||||
#include <immintrin.h>
|
||||
/*!
|
||||
\brief Returns the index of the max value in src0
|
||||
\param target The index of the max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_index_max_16u_a_avx(unsigned int* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -334,14 +333,10 @@ static inline void volk_gnsssdr_8i_index_max_16u_a_avx(unsigned int* target, con
|
||||
|
||||
#endif /*LV_HAVE_AVX*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE4_1
|
||||
#include <smmintrin.h>
|
||||
/*!
|
||||
\brief Returns the index of the max value in src0
|
||||
\param target The index of the max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_index_max_16u_a_sse4_1(unsigned int* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -395,14 +390,10 @@ static inline void volk_gnsssdr_8i_index_max_16u_a_sse4_1(unsigned int* target,
|
||||
|
||||
#endif /*LV_HAVE_SSE4_1*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
/*!
|
||||
\brief Returns the index of the max value in src0
|
||||
\param target The index of the max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_index_max_16u_a_sse2(unsigned int* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -463,5 +454,4 @@ static inline void volk_gnsssdr_8i_index_max_16u_a_sse2(unsigned int* target, co
|
||||
#endif /*LV_HAVE_SSE2*/
|
||||
|
||||
|
||||
|
||||
#endif /*INCLUDED_volk_gnsssdr_8i_index_max_16u_H*/
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8i_max_s8i.h
|
||||
* \brief Volk protokernel: calculates the maximum value in a group of 8 bits (char) scalars
|
||||
* \brief VOLK_GNSSSDR kernel: calculates the maximum value in a group of 8 bits (char) scalars.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that returns the maximum value of a group of 8 bits (char) scalars
|
||||
* VOLK_GNSSSDR kernel that returns the maximum value of a group of 8 bits (char) scalars
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
@ -32,6 +32,27 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8i_max_s8i
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Returns the max value in \p src0
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_8i_max_s8i(char* target, const char* src0, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li src0: The buffer of data to be analyzed.
|
||||
* \li num_points: The number of values in \p src0 to be analyzed.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li target: The max value in \p src0
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8i_max_s8i_H
|
||||
#define INCLUDED_volk_gnsssdr_8i_max_s8i_H
|
||||
|
||||
@ -39,12 +60,7 @@
|
||||
|
||||
#ifdef LV_HAVE_SSE4_1
|
||||
#include <smmintrin.h>
|
||||
/*!
|
||||
\brief Returns the max value in src0
|
||||
\param target The max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_max_s8i_u_sse4_1(char* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -89,14 +105,10 @@ static inline void volk_gnsssdr_8i_max_s8i_u_sse4_1(char* target, const char* sr
|
||||
|
||||
#endif /*LV_HAVE_SSE4_1*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include<emmintrin.h>
|
||||
/*!
|
||||
\brief Returns the max value in src0
|
||||
\param target The max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_max_s8i_u_sse2(char* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -152,13 +164,9 @@ static inline void volk_gnsssdr_8i_max_s8i_u_sse2(char* target, const char* src0
|
||||
|
||||
#endif /*LV_HAVE_SSE2*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Returns the max value in src0
|
||||
\param target The max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_max_s8i_generic(char* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -179,16 +187,9 @@ static inline void volk_gnsssdr_8i_max_s8i_generic(char* target, const char* src
|
||||
#endif /*LV_HAVE_GENERIC*/
|
||||
|
||||
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE4_1
|
||||
#include <smmintrin.h>
|
||||
/*!
|
||||
\brief Returns the max value in src0
|
||||
\param target The max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_max_s8i_a_sse4_1(char* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
@ -233,14 +234,10 @@ static inline void volk_gnsssdr_8i_max_s8i_a_sse4_1(char* target, const char* sr
|
||||
|
||||
#endif /*LV_HAVE_SSE4_1*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
/*!
|
||||
\brief Returns the max value in src0
|
||||
\param target The max value in src0
|
||||
\param src0 The buffer of data to be analysed
|
||||
\param num_points The number of values in src0 to be analysed
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_max_s8i_a_sse2(char* target, const char* src0, unsigned int num_points)
|
||||
{
|
||||
if(num_points > 0)
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8i_x2_add_8i.h
|
||||
* \brief Volk protokernel: adds pairs of 8 bits (char) scalars
|
||||
* \brief VOLK_GNSSSDR kernel: adds pairs of 8 bits (char) scalars.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that adds pairs of 8 bits (char) scalars
|
||||
* VOLK_GNSSSDR kernel that adds pairs of 8 bits (char) scalars
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
@ -32,26 +32,42 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8i_x2_add_8i
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Adds the two input vectors and store the results in the third vector.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_8i_x2_add_8i(char* cVector, const char* aVector, const char* bVector, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li aVector: One of the vectors of to be added.
|
||||
* \li bVector: The other vector to be added.
|
||||
* \li num_points: Number of values in \p aVector and \p bVector to be added together and stored into \p cVector
|
||||
*
|
||||
* \b Outputs
|
||||
* \li cVector: The vector where the result will be stored.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8i_x2_add_8i_H
|
||||
#define INCLUDED_volk_gnsssdr_8i_x2_add_8i_H
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
/*!
|
||||
\brief Adds the two input vectors and store their results in the third vector
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector One of the vectors to be added
|
||||
\param bVector One of the vectors to be added
|
||||
\param num_points The number of values in aVector and bVector to be added together and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_x2_add_8i_u_sse2(char* cVector, const char* aVector, const char* bVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 16;
|
||||
|
||||
char* cPtr = cVector;
|
||||
const char* aPtr = aVector;
|
||||
const char* bPtr= bVector;
|
||||
const char* bPtr = bVector;
|
||||
|
||||
__m128i aVal, bVal, cVal;
|
||||
|
||||
@ -62,7 +78,7 @@ static inline void volk_gnsssdr_8i_x2_add_8i_u_sse2(char* cVector, const char* a
|
||||
|
||||
cVal = _mm_add_epi8(aVal, bVal);
|
||||
|
||||
_mm_storeu_si128((__m128i*)cPtr,cVal); // Store the results back into the C container
|
||||
_mm_storeu_si128((__m128i*)cPtr, cVal); // Store the results back into the C container
|
||||
|
||||
aPtr += 16;
|
||||
bPtr += 16;
|
||||
@ -76,19 +92,14 @@ static inline void volk_gnsssdr_8i_x2_add_8i_u_sse2(char* cVector, const char* a
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Adds the two input vectors and store their results in the third vector
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector One of the vectors to be added
|
||||
\param bVector One of the vectors to be added
|
||||
\param num_points The number of values in aVector and bVector to be added together and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_x2_add_8i_generic(char* cVector, const char* aVector, const char* bVector, unsigned int num_points)
|
||||
{
|
||||
char* cPtr = cVector;
|
||||
const char* aPtr = aVector;
|
||||
const char* bPtr= bVector;
|
||||
const char* bPtr = bVector;
|
||||
unsigned int number = 0;
|
||||
|
||||
for(number = 0; number < num_points; number++)
|
||||
@ -101,20 +112,14 @@ static inline void volk_gnsssdr_8i_x2_add_8i_generic(char* cVector, const char*
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
/*!
|
||||
\brief Adds the two input vectors and store their results in the third vector
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector One of the vectors to be added
|
||||
\param bVector One of the vectors to be added
|
||||
\param num_points The number of values in aVector and bVector to be added together and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8i_x2_add_8i_a_sse2(char* cVector, const char* aVector, const char* bVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 16;
|
||||
|
||||
char* cPtr = cVector;
|
||||
const char* aPtr = aVector;
|
||||
const char* bPtr= bVector;
|
||||
const char* bPtr = bVector;
|
||||
|
||||
__m128i aVal, bVal, cVal;
|
||||
|
||||
@ -141,13 +146,7 @@ static inline void volk_gnsssdr_8i_x2_add_8i_a_sse2(char* cVector, const char* a
|
||||
|
||||
|
||||
#ifdef LV_HAVE_ORC
|
||||
/*!
|
||||
\brief Adds the two input vectors and store their results in the third vector
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector One of the vectors to be added
|
||||
\param bVector One of the vectors to be added
|
||||
\param num_points The number of values in aVector and bVector to be added together and stored into cVector
|
||||
*/
|
||||
|
||||
extern void volk_gnsssdr_8i_x2_add_8i_a_orc_impl(char* cVector, const char* aVector, const char* bVector, unsigned int num_points);
|
||||
static inline void volk_gnsssdr_8i_x2_add_8i_u_orc(char* cVector, const char* aVector, const char* bVector, unsigned int num_points)
|
||||
{
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8ic_conjugate_8ic.h
|
||||
* \brief Volk protokernel: calculates the conjugate of a 16 bits vector
|
||||
* \brief VOLK_GNSSSDR kernel: calculates the conjugate of a 16 bits vector.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that calculates the conjugate of a
|
||||
* VOLK_GNSSSDR kernel that calculates the conjugate of a
|
||||
* 16 bits vector (8 bits the real part and 8 bits the imaginary part)
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
@ -33,6 +33,27 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8ic_conjugate_8ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Takes the conjugate of a complex unsigned char vector.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_8ic_conjugate_8ic(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li aVector: Vector of complex items to be conjugated
|
||||
* \li num_points: The number of complex data points.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li cVector: The vector where the result will be stored
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8ic_conjugate_8ic_H
|
||||
#define INCLUDED_volk_gnsssdr_8ic_conjugate_8ic_H
|
||||
|
||||
@ -40,12 +61,7 @@
|
||||
|
||||
#ifdef LV_HAVE_AVX
|
||||
#include <immintrin.h>
|
||||
/*!
|
||||
\brief Takes the conjugate of an unsigned char vector.
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector Vector to be conjugated
|
||||
\param num_points The number of unsigned char values in aVector to be conjugated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_conjugate_8ic_u_avx(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 16;
|
||||
@ -81,14 +97,10 @@ static inline void volk_gnsssdr_8ic_conjugate_8ic_u_avx(lv_8sc_t* cVector, const
|
||||
}
|
||||
#endif /* LV_HAVE_AVX */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSSE3
|
||||
#include <tmmintrin.h>
|
||||
/*!
|
||||
\brief Takes the conjugate of an unsigned char vector.
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector Vector to be conjugated
|
||||
\param num_points The number of unsigned char values in aVector to be conjugated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_conjugate_8ic_u_ssse3(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -116,14 +128,10 @@ static inline void volk_gnsssdr_8ic_conjugate_8ic_u_ssse3(lv_8sc_t* cVector, con
|
||||
}
|
||||
#endif /* LV_HAVE_SSSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
/*!
|
||||
\brief Takes the conjugate of an unsigned char vector.
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector Vector to be conjugated
|
||||
\param num_points The number of unsigned char values in aVector to be conjugated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_conjugate_8ic_u_sse3(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -153,13 +161,9 @@ static inline void volk_gnsssdr_8ic_conjugate_8ic_u_sse3(lv_8sc_t* cVector, cons
|
||||
}
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Takes the conjugate of an unsigned char vector.
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector Vector to be conjugated
|
||||
\param num_points The number of unsigned char values in aVector to be conjugated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_conjugate_8ic_generic(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points)
|
||||
{
|
||||
lv_8sc_t* cPtr = cVector;
|
||||
@ -176,12 +180,7 @@ static inline void volk_gnsssdr_8ic_conjugate_8ic_generic(lv_8sc_t* cVector, con
|
||||
|
||||
#ifdef LV_HAVE_AVX
|
||||
#include <immintrin.h>
|
||||
/*!
|
||||
\brief Takes the conjugate of an unsigned char vector.
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector Vector to be conjugated
|
||||
\param num_points The number of unsigned char values in aVector to be conjugated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_conjugate_8ic_a_avx(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 16;
|
||||
@ -217,14 +216,10 @@ static inline void volk_gnsssdr_8ic_conjugate_8ic_a_avx(lv_8sc_t* cVector, const
|
||||
}
|
||||
#endif /* LV_HAVE_AVX */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSSE3
|
||||
#include <tmmintrin.h>
|
||||
/*!
|
||||
\brief Takes the conjugate of an unsigned char vector.
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector Vector to be conjugated
|
||||
\param num_points The number of unsigned char values in aVector to be conjugated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_conjugate_8ic_a_ssse3(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -252,14 +247,10 @@ static inline void volk_gnsssdr_8ic_conjugate_8ic_a_ssse3(lv_8sc_t* cVector, con
|
||||
}
|
||||
#endif /* LV_HAVE_SSSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
/*!
|
||||
\brief Takes the conjugate of an unsigned char vector.
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector Vector to be conjugated
|
||||
\param num_points The number of unsigned char values in aVector to be conjugated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_conjugate_8ic_a_sse3(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -291,12 +282,7 @@ static inline void volk_gnsssdr_8ic_conjugate_8ic_a_sse3(lv_8sc_t* cVector, cons
|
||||
|
||||
|
||||
#ifdef LV_HAVE_ORC
|
||||
/*!
|
||||
\brief Takes the conjugate of an unsigned char vector.
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector Vector to be conjugated
|
||||
\param num_points The number of unsigned char values in aVector to be conjugated and stored into cVector
|
||||
*/
|
||||
|
||||
extern void volk_gnsssdr_8ic_conjugate_8ic_a_orc_impl(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points);
|
||||
static inline void volk_gnsssdr_8ic_conjugate_8ic_u_orc(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points)
|
||||
{
|
||||
@ -307,12 +293,7 @@ static inline void volk_gnsssdr_8ic_conjugate_8ic_u_orc(lv_8sc_t* cVector, const
|
||||
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
/*!
|
||||
\brief Takes the conjugate of an unsigned char vector.
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector Vector to be conjugated
|
||||
\param num_points The number of unsigned char values in aVector to be conjugated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_conjugate_8ic_neon(lv_8sc_t* cVector, const lv_8sc_t* aVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -335,7 +316,6 @@ static inline void volk_gnsssdr_8ic_conjugate_8ic_neon(lv_8sc_t* cVector, const
|
||||
{
|
||||
*c++ = lv_conj(*a++);
|
||||
}
|
||||
|
||||
}
|
||||
#endif /* LV_HAVE_NEON */
|
||||
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8ic_magnitude_squared_8i.h
|
||||
* \brief Volk protokernel: calculates the magnitude squared of a 16 bits vector
|
||||
* \brief VOLK_GNSSSDR kernel: calculates the magnitude squared of a 16 bits vector.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that calculates the magnitude squared of a
|
||||
* VOLK_GNSSSDR kernel that calculates the magnitude squared of a
|
||||
* 16 bits vector (8 bits the real part and 8 bits the imaginary part)
|
||||
* result = (real*real) + (imag*imag)
|
||||
*
|
||||
@ -34,18 +34,34 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8ic_magnitude_squared_8i
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Calculates the magnitude squared of the complex data items in \p complexVector and stores the results in \p magnitudeVector
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_8ic_magnitude_squared_8i(char* magnitudeVector, const lv_8sc_t* complexVector, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li complexVector: The vector containing the complex input values
|
||||
* \li num_points: The number of complex data points.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li magnitudeVector: The vector containing the real output values
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8ic_magnitude_squared_8i_H
|
||||
#define INCLUDED_volk_gnsssdr_8ic_magnitude_squared_8i_H
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSSE3
|
||||
#include <tmmintrin.h>
|
||||
/*!
|
||||
\brief Calculates the magnitude squared of complexVector and stores the results in magnitudeVector
|
||||
\param complexVector The vector containing the complex input values
|
||||
\param magnitudeVector The vector containing the real output values
|
||||
\param num_points The number of complex values in complexVector to be calculated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_magnitude_squared_8i_u_sse3(char* magnitudeVector, const lv_8sc_t* complexVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 16;
|
||||
@ -99,12 +115,7 @@ static inline void volk_gnsssdr_8ic_magnitude_squared_8i_u_sse3(char* magnitudeV
|
||||
|
||||
//#ifdef LV_HAVE_SSE
|
||||
//#include <xmmintrin.h>
|
||||
///*!
|
||||
// \brief Calculates the magnitude squared of complexVector and stores the results in magnitudeVector
|
||||
// \param complexVector The vector containing the complex input values
|
||||
// \param magnitudeVector The vector containing the real output values
|
||||
// \param num_points The number of complex values in complexVector to be calculated and stored into cVector
|
||||
// */
|
||||
//
|
||||
//static inline void volk_gnsssdr_8ic_magnitude_squared_8i_u_sse(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){
|
||||
// unsigned int number = 0;
|
||||
// const unsigned int quarterPoints = num_points / 4;
|
||||
@ -144,12 +155,7 @@ static inline void volk_gnsssdr_8ic_magnitude_squared_8i_u_sse3(char* magnitudeV
|
||||
//#endif /* LV_HAVE_SSE */
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Calculates the magnitude squared of complexVector and stores the results in magnitudeVector
|
||||
\param complexVector The vector containing the complex input values
|
||||
\param magnitudeVector The vector containing the real output values
|
||||
\param num_points The number of complex values in complexVector to be calculated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_magnitude_squared_8i_generic(char* magnitudeVector, const lv_8sc_t* complexVector, unsigned int num_points)
|
||||
{
|
||||
const char* complexVectorPtr = (char*)complexVector;
|
||||
@ -167,12 +173,7 @@ static inline void volk_gnsssdr_8ic_magnitude_squared_8i_generic(char* magnitude
|
||||
|
||||
#ifdef LV_HAVE_SSSE3
|
||||
#include <tmmintrin.h>
|
||||
/*!
|
||||
\brief Calculates the magnitude squared of complexVector and stores the results in magnitudeVector
|
||||
\param complexVector The vector containing the complex input values
|
||||
\param magnitudeVector The vector containing the real output values
|
||||
\param num_points The number of complex values in complexVector to be calculated and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_magnitude_squared_8i_a_sse3(char* magnitudeVector, const lv_8sc_t* complexVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 16;
|
||||
@ -226,12 +227,7 @@ static inline void volk_gnsssdr_8ic_magnitude_squared_8i_a_sse3(char* magnitudeV
|
||||
|
||||
//#ifdef LV_HAVE_SSE
|
||||
//#include <xmmintrin.h>
|
||||
///*!
|
||||
// \brief Calculates the magnitude squared of complexVector and stores the results in magnitudeVector
|
||||
// \param complexVector The vector containing the complex input values
|
||||
// \param magnitudeVector The vector containing the real output values
|
||||
// \param num_points The number of complex values in complexVector to be calculated and stored into cVector
|
||||
// */
|
||||
//
|
||||
//static inline void volk_gnsssdr_8ic_magnitude_squared_8i_a_sse(float* magnitudeVector, const lv_32fc_t* complexVector, unsigned int num_points){
|
||||
// unsigned int number = 0;
|
||||
// const unsigned int quarterPoints = num_points / 4;
|
||||
@ -272,12 +268,7 @@ static inline void volk_gnsssdr_8ic_magnitude_squared_8i_a_sse3(char* magnitudeV
|
||||
|
||||
|
||||
#ifdef LV_HAVE_ORC
|
||||
/*!
|
||||
\brief Calculates the magnitude squared of complexVector and stores the results in magnitudeVector
|
||||
\param complexVector The vector containing the complex input values
|
||||
\param magnitudeVector The vector containing the real output values
|
||||
\param num_points The number of complex values in complexVector to be calculated and stored into cVector
|
||||
*/
|
||||
|
||||
extern void volk_gnsssdr_8ic_magnitude_squared_8i_a_orc_impl(char* magnitudeVector, const lv_8sc_t* complexVector, unsigned int num_points);
|
||||
static inline void volk_gnsssdr_8ic_magnitude_squared_8i_u_orc(char* magnitudeVector, const lv_8sc_t* complexVector, unsigned int num_points)
|
||||
{
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8ic_s8ic_multiply_8ic.h
|
||||
* \brief Volk protokernel: multiplies a group of 16 bits vectors by one constant vector
|
||||
* \brief VOLK_GNSSSDR kernel: multiplies a group of 16 bits vectors by one constant vector.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that multiplies a group of 16 bits vectors
|
||||
* VOLK_GNSSSDR kernel that multiplies a group of 16 bits vectors
|
||||
* (8 bits the real part and 8 bits the imaginary part) by one constant vector
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
@ -33,6 +33,28 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8ic_s8ic_multiply_8ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Multiplies the input vector by a scalar and stores the results in the third vector
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_8ic_s8ic_multiply_8ic(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t scalar, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li aVector: The vector to be multiplied.
|
||||
* \li scalar The complex scalar to multiply \p aVector
|
||||
* \li num_points: The number of complex values in \p aVector to be multiplied by \p scalar and stored into \p cVector.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li cVector: The vector where the results will be stored
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8ic_s8ic_multiply_8ic_H
|
||||
#define INCLUDED_volk_gnsssdr_8ic_s8ic_multiply_8ic_H
|
||||
|
||||
@ -40,13 +62,7 @@
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
/*!
|
||||
\brief Multiplies the input vector by a scalar and stores the results in the third vector
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector The vector to be multiplied
|
||||
\param scalar The complex scalar to multiply aVector
|
||||
\param num_points The number of complex values in aVector to be multiplied by sacalar and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_s8ic_multiply_8ic_u_sse3(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t scalar, unsigned int num_points)
|
||||
{
|
||||
unsigned int number = 0;
|
||||
@ -59,31 +75,31 @@ static inline void volk_gnsssdr_8ic_s8ic_multiply_8ic_u_sse3(lv_8sc_t* cVector,
|
||||
|
||||
mult1 = _mm_set_epi8(0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255);
|
||||
|
||||
y = _mm_set1_epi16 (*(short*)&scalar);
|
||||
imagy = _mm_srli_si128 (y, 1);
|
||||
imagy = _mm_and_si128 (imagy, mult1);
|
||||
realy = _mm_and_si128 (y, mult1);
|
||||
y = _mm_set1_epi16(*(short*)&scalar);
|
||||
imagy = _mm_srli_si128(y, 1);
|
||||
imagy = _mm_and_si128(imagy, mult1);
|
||||
realy = _mm_and_si128(y, mult1);
|
||||
|
||||
for(; number < sse_iters; number++)
|
||||
{
|
||||
x = _mm_lddqu_si128((__m128i*)a);
|
||||
|
||||
imagx = _mm_srli_si128 (x, 1);
|
||||
imagx = _mm_and_si128 (imagx, mult1);
|
||||
realx = _mm_and_si128 (x, mult1);
|
||||
imagx = _mm_srli_si128(x, 1);
|
||||
imagx = _mm_and_si128(imagx, mult1);
|
||||
realx = _mm_and_si128(x, mult1);
|
||||
|
||||
realx_mult_realy = _mm_mullo_epi16 (realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16 (imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16 (realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16 (imagx, realy);
|
||||
realx_mult_realy = _mm_mullo_epi16(realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16(imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16(realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16(imagx, realy);
|
||||
|
||||
realc = _mm_sub_epi16 (realx_mult_realy, imagx_mult_imagy);
|
||||
realc = _mm_and_si128 (realc, mult1);
|
||||
imagc = _mm_add_epi16 (realx_mult_imagy, imagx_mult_realy);
|
||||
imagc = _mm_and_si128 (imagc, mult1);
|
||||
imagc = _mm_slli_si128 (imagc, 1);
|
||||
realc = _mm_sub_epi16(realx_mult_realy, imagx_mult_imagy);
|
||||
realc = _mm_and_si128(realc, mult1);
|
||||
imagc = _mm_add_epi16(realx_mult_imagy, imagx_mult_realy);
|
||||
imagc = _mm_and_si128(imagc, mult1);
|
||||
imagc = _mm_slli_si128(imagc, 1);
|
||||
|
||||
totalc = _mm_or_si128 (realc, imagc);
|
||||
totalc = _mm_or_si128(realc, imagc);
|
||||
|
||||
_mm_storeu_si128((__m128i*)c, totalc);
|
||||
|
||||
@ -99,14 +115,9 @@ static inline void volk_gnsssdr_8ic_s8ic_multiply_8ic_u_sse3(lv_8sc_t* cVector,
|
||||
}
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Multiplies the input vector by a scalar and stores the results in the third vector
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector The vector to be multiplied
|
||||
\param scalar The complex scalar to multiply aVector
|
||||
\param num_points The number of complex values in aVector to be multiplied by sacalar and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_s8ic_multiply_8ic_generic(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t scalar, unsigned int num_points)
|
||||
{
|
||||
/*lv_8sc_t* cPtr = cVector;
|
||||
@ -144,13 +155,7 @@ static inline void volk_gnsssdr_8ic_s8ic_multiply_8ic_generic(lv_8sc_t* cVector,
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
/*!
|
||||
\brief Multiplies the input vector by a scalar and stores the results in the third vector
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector The vector to be multiplied
|
||||
\param scalar The complex scalar to multiply aVector
|
||||
\param num_points The number of complex values in aVector to be multiplied by sacalar and stored into cVector
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8ic_s8ic_multiply_8ic_a_sse3(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t scalar, unsigned int num_points)
|
||||
{
|
||||
unsigned int number = 0;
|
||||
@ -163,31 +168,31 @@ static inline void volk_gnsssdr_8ic_s8ic_multiply_8ic_a_sse3(lv_8sc_t* cVector,
|
||||
|
||||
mult1 = _mm_set_epi8(0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255);
|
||||
|
||||
y = _mm_set1_epi16 (*(short*)&scalar);
|
||||
imagy = _mm_srli_si128 (y, 1);
|
||||
imagy = _mm_and_si128 (imagy, mult1);
|
||||
realy = _mm_and_si128 (y, mult1);
|
||||
y = _mm_set1_epi16(*(short*)&scalar);
|
||||
imagy = _mm_srli_si128(y, 1);
|
||||
imagy = _mm_and_si128(imagy, mult1);
|
||||
realy = _mm_and_si128(y, mult1);
|
||||
|
||||
for(; number < sse_iters; number++)
|
||||
{
|
||||
x = _mm_load_si128((__m128i*)a);
|
||||
|
||||
imagx = _mm_srli_si128 (x, 1);
|
||||
imagx = _mm_and_si128 (imagx, mult1);
|
||||
realx = _mm_and_si128 (x, mult1);
|
||||
imagx = _mm_srli_si128(x, 1);
|
||||
imagx = _mm_and_si128(imagx, mult1);
|
||||
realx = _mm_and_si128(x, mult1);
|
||||
|
||||
realx_mult_realy = _mm_mullo_epi16 (realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16 (imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16 (realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16 (imagx, realy);
|
||||
realx_mult_realy = _mm_mullo_epi16(realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16(imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16(realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16(imagx, realy);
|
||||
|
||||
realc = _mm_sub_epi16 (realx_mult_realy, imagx_mult_imagy);
|
||||
realc = _mm_and_si128 (realc, mult1);
|
||||
imagc = _mm_add_epi16 (realx_mult_imagy, imagx_mult_realy);
|
||||
imagc = _mm_and_si128 (imagc, mult1);
|
||||
imagc = _mm_slli_si128 (imagc, 1);
|
||||
realc = _mm_sub_epi16(realx_mult_realy, imagx_mult_imagy);
|
||||
realc = _mm_and_si128(realc, mult1);
|
||||
imagc = _mm_add_epi16(realx_mult_imagy, imagx_mult_realy);
|
||||
imagc = _mm_and_si128(imagc, mult1);
|
||||
imagc = _mm_slli_si128(imagc, 1);
|
||||
|
||||
totalc = _mm_or_si128 (realc, imagc);
|
||||
totalc = _mm_or_si128(realc, imagc);
|
||||
|
||||
_mm_store_si128((__m128i*)c, totalc);
|
||||
|
||||
@ -205,13 +210,7 @@ static inline void volk_gnsssdr_8ic_s8ic_multiply_8ic_a_sse3(lv_8sc_t* cVector,
|
||||
|
||||
|
||||
#ifdef LV_HAVE_ORC
|
||||
/*!
|
||||
\brief Multiplies the input vector by a scalar and stores the results in the third vector
|
||||
\param cVector The vector where the results will be stored
|
||||
\param aVector The vector to be multiplied
|
||||
\param scalar The complex scalar to multiply aVector
|
||||
\param num_points The number of complex values in aVector to be multiplied by sacalar and stored into cVector
|
||||
*/
|
||||
|
||||
extern void volk_gnsssdr_8ic_s8ic_multiply_8ic_a_orc_impl(lv_8sc_t* cVector, const lv_8sc_t* aVector, const char scalarreal, const char scalarimag, unsigned int num_points);
|
||||
static inline void volk_gnsssdr_8ic_s8ic_multiply_8ic_u_orc(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t scalar, unsigned int num_points)
|
||||
{
|
||||
|
@ -1,12 +1,12 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8ic_x2_dot_prod_8ic.h
|
||||
* \brief Volk protokernel: multiplies two 16 bits vectors and accumulates them
|
||||
* \brief VOLK_GNSSSDR kernel: multiplies two 16 bits vectors and accumulates them.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* and 8 bits the imaginary part) and accumulates them
|
||||
* VOLK_GNSSSDR kernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* and 8 bits the imaginary part) and accumulates them.
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
@ -33,6 +33,29 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8ic_x2_dot_prod_8ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Multiplies two input complex vectors (8-bit integer each component) and accumulates them,
|
||||
* storing the result.
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_16ic_x2_dot_prod_16ic(lv_16sc_t* result, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li in_a: One of the vectors to be multiplied and accumulated
|
||||
* \li in_b: The other vector to be multiplied and accumulated
|
||||
* \li num_points: The Number of complex values to be multiplied together, accumulated and stored into \p result
|
||||
*
|
||||
* \b Outputs
|
||||
* \li result: Value of the accumulated result
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8ic_x2_dot_prod_8ic_H
|
||||
#define INCLUDED_volk_gnsssdr_8ic_x2_dot_prod_8ic_H
|
||||
|
||||
@ -42,26 +65,19 @@
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and accumulates them, storing the result.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] input One of the vectors to be multiplied
|
||||
\param[in] taps One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in input and taps to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_generic(lv_8sc_t* result, const lv_8sc_t* input, const lv_8sc_t* taps, unsigned int num_points)
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_generic(lv_8sc_t* result, const lv_8sc_t* in_a, const lv_8sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
/*lv_8sc_t* cPtr = result;
|
||||
const lv_8sc_t* aPtr = input;
|
||||
const lv_8sc_t* bPtr = taps;
|
||||
const lv_8sc_t* aPtr = in_a;
|
||||
const lv_8sc_t* bPtr = in_b;
|
||||
|
||||
for(int number = 0; number < num_points; number++){
|
||||
*cPtr += (*aPtr++) * (*bPtr++);
|
||||
}*/
|
||||
|
||||
char * res = (char*) result;
|
||||
char * in = (char*) input;
|
||||
char * tp = (char*) taps;
|
||||
char * in = (char*) in_a;
|
||||
char * tp = (char*) in_b;
|
||||
unsigned int n_2_ccomplex_blocks = num_points/2;
|
||||
unsigned int isodd = num_points & 1;
|
||||
|
||||
@ -86,29 +102,23 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_generic(lv_8sc_t* result, co
|
||||
// Cleanup if we had an odd number of points
|
||||
for(i = 0; i < isodd; ++i)
|
||||
{
|
||||
*result += input[num_points - 1] * taps[num_points - 1];
|
||||
*result += in_a[num_points - 1] * in_b[num_points - 1];
|
||||
}
|
||||
}
|
||||
|
||||
#endif /*LV_HAVE_GENERIC*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and accumulates them, storing the result.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] input One of the vectors to be multiplied
|
||||
\param[in] taps One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in input and taps to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_sse2(lv_8sc_t* result, const lv_8sc_t* input, const lv_8sc_t* taps, unsigned int num_points)
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_sse2(lv_8sc_t* result, const lv_8sc_t* in_a, const lv_8sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
lv_8sc_t dotProduct;
|
||||
memset(&dotProduct, 0x0, 2*sizeof(char));
|
||||
|
||||
const lv_8sc_t* a = input;
|
||||
const lv_8sc_t* b = taps;
|
||||
const lv_8sc_t* a = in_a;
|
||||
const lv_8sc_t* b = in_b;
|
||||
|
||||
const unsigned int sse_iters = num_points/8;
|
||||
|
||||
@ -125,40 +135,40 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_sse2(lv_8sc_t* result, con
|
||||
x = _mm_loadu_si128((__m128i*)a);
|
||||
y = _mm_loadu_si128((__m128i*)b);
|
||||
|
||||
imagx = _mm_srli_si128 (x, 1);
|
||||
imagx = _mm_and_si128 (imagx, mult1);
|
||||
realx = _mm_and_si128 (x, mult1);
|
||||
imagx = _mm_srli_si128(x, 1);
|
||||
imagx = _mm_and_si128(imagx, mult1);
|
||||
realx = _mm_and_si128(x, mult1);
|
||||
|
||||
imagy = _mm_srli_si128 (y, 1);
|
||||
imagy = _mm_and_si128 (imagy, mult1);
|
||||
realy = _mm_and_si128 (y, mult1);
|
||||
imagy = _mm_srli_si128(y, 1);
|
||||
imagy = _mm_and_si128(imagy, mult1);
|
||||
realy = _mm_and_si128(y, mult1);
|
||||
|
||||
realx_mult_realy = _mm_mullo_epi16 (realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16 (imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16 (realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16 (imagx, realy);
|
||||
realx_mult_realy = _mm_mullo_epi16(realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16(imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16(realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16(imagx, realy);
|
||||
|
||||
realc = _mm_sub_epi16 (realx_mult_realy, imagx_mult_imagy);
|
||||
imagc = _mm_add_epi16 (realx_mult_imagy, imagx_mult_realy);
|
||||
realc = _mm_sub_epi16(realx_mult_realy, imagx_mult_imagy);
|
||||
imagc = _mm_add_epi16(realx_mult_imagy, imagx_mult_realy);
|
||||
|
||||
realcacc = _mm_add_epi16 (realcacc, realc);
|
||||
imagcacc = _mm_add_epi16 (imagcacc, imagc);
|
||||
realcacc = _mm_add_epi16(realcacc, realc);
|
||||
imagcacc = _mm_add_epi16(imagcacc, imagc);
|
||||
|
||||
a += 8;
|
||||
b += 8;
|
||||
}
|
||||
|
||||
realcacc = _mm_and_si128 (realcacc, mult1);
|
||||
imagcacc = _mm_and_si128 (imagcacc, mult1);
|
||||
imagcacc = _mm_slli_si128 (imagcacc, 1);
|
||||
realcacc = _mm_and_si128(realcacc, mult1);
|
||||
imagcacc = _mm_and_si128(imagcacc, mult1);
|
||||
imagcacc = _mm_slli_si128(imagcacc, 1);
|
||||
|
||||
totalc = _mm_or_si128 (realcacc, imagcacc);
|
||||
totalc = _mm_or_si128(realcacc, imagcacc);
|
||||
|
||||
__VOLK_ATTR_ALIGNED(16) lv_8sc_t dotProductVector[8];
|
||||
|
||||
_mm_storeu_si128((__m128i*)dotProductVector,totalc); // Store the results back into the dot product vector
|
||||
_mm_storeu_si128((__m128i*)dotProductVector, totalc); // Store the results back into the dot product vector
|
||||
|
||||
for (int i = 0; i<8; ++i)
|
||||
for (int i = 0; i < 8; ++i)
|
||||
{
|
||||
dotProduct += dotProductVector[i];
|
||||
}
|
||||
@ -174,23 +184,17 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_sse2(lv_8sc_t* result, con
|
||||
|
||||
#endif /*LV_HAVE_SSE2*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE4_1
|
||||
#include <smmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and accumulates them, storing the result.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] input One of the vectors to be multiplied
|
||||
\param[in] taps One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in input and taps to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_sse4_1(lv_8sc_t* result, const lv_8sc_t* input, const lv_8sc_t* taps, unsigned int num_points)
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_sse4_1(lv_8sc_t* result, const lv_8sc_t* in_a, const lv_8sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
lv_8sc_t dotProduct;
|
||||
memset(&dotProduct, 0x0, 2*sizeof(char));
|
||||
|
||||
const lv_8sc_t* a = input;
|
||||
const lv_8sc_t* b = taps;
|
||||
const lv_8sc_t* a = in_a;
|
||||
const lv_8sc_t* b = in_b;
|
||||
|
||||
const unsigned int sse_iters = num_points/8;
|
||||
|
||||
@ -207,24 +211,24 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_sse4_1(lv_8sc_t* result, c
|
||||
x = _mm_lddqu_si128((__m128i*)a);
|
||||
y = _mm_lddqu_si128((__m128i*)b);
|
||||
|
||||
imagx = _mm_srli_si128 (x, 1);
|
||||
imagx = _mm_and_si128 (imagx, mult1);
|
||||
realx = _mm_and_si128 (x, mult1);
|
||||
imagx = _mm_srli_si128(x, 1);
|
||||
imagx = _mm_and_si128(imagx, mult1);
|
||||
realx = _mm_and_si128(x, mult1);
|
||||
|
||||
imagy = _mm_srli_si128 (y, 1);
|
||||
imagy = _mm_and_si128 (imagy, mult1);
|
||||
realy = _mm_and_si128 (y, mult1);
|
||||
imagy = _mm_srli_si128(y, 1);
|
||||
imagy = _mm_and_si128(imagy, mult1);
|
||||
realy = _mm_and_si128(y, mult1);
|
||||
|
||||
realx_mult_realy = _mm_mullo_epi16 (realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16 (imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16 (realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16 (imagx, realy);
|
||||
realx_mult_realy = _mm_mullo_epi16(realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16(imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16(realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16(imagx, realy);
|
||||
|
||||
realc = _mm_sub_epi16 (realx_mult_realy, imagx_mult_imagy);
|
||||
imagc = _mm_add_epi16 (realx_mult_imagy, imagx_mult_realy);
|
||||
realc = _mm_sub_epi16(realx_mult_realy, imagx_mult_imagy);
|
||||
imagc = _mm_add_epi16(realx_mult_imagy, imagx_mult_realy);
|
||||
|
||||
realcacc = _mm_add_epi16 (realcacc, realc);
|
||||
imagcacc = _mm_add_epi16 (imagcacc, imagc);
|
||||
realcacc = _mm_add_epi16(realcacc, realc);
|
||||
imagcacc = _mm_add_epi16(imagcacc, imagc);
|
||||
|
||||
a += 8;
|
||||
b += 8;
|
||||
@ -236,9 +240,9 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_sse4_1(lv_8sc_t* result, c
|
||||
|
||||
__VOLK_ATTR_ALIGNED(16) lv_8sc_t dotProductVector[8];
|
||||
|
||||
_mm_storeu_si128((__m128i*)dotProductVector,totalc); // Store the results back into the dot product vector
|
||||
_mm_storeu_si128((__m128i*)dotProductVector, totalc); // Store the results back into the dot product vector
|
||||
|
||||
for (unsigned int i = 0; i<8; ++i)
|
||||
for (unsigned int i = 0; i < 8; ++i)
|
||||
{
|
||||
dotProduct += dotProductVector[i];
|
||||
}
|
||||
@ -258,20 +262,13 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_sse4_1(lv_8sc_t* result, c
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and accumulates them, storing the result.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] input One of the vectors to be multiplied
|
||||
\param[in] taps One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in input and taps to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_sse2(lv_8sc_t* result, const lv_8sc_t* input, const lv_8sc_t* taps, unsigned int num_points)
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_sse2(lv_8sc_t* result, const lv_8sc_t* in_a, const lv_8sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
lv_8sc_t dotProduct;
|
||||
memset(&dotProduct, 0x0, 2*sizeof(char));
|
||||
|
||||
const lv_8sc_t* a = input;
|
||||
const lv_8sc_t* b = taps;
|
||||
const lv_8sc_t* a = in_a;
|
||||
const lv_8sc_t* b = in_b;
|
||||
|
||||
const unsigned int sse_iters = num_points/8;
|
||||
|
||||
@ -288,40 +285,40 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_sse2(lv_8sc_t* result, con
|
||||
x = _mm_load_si128((__m128i*)a);
|
||||
y = _mm_load_si128((__m128i*)b);
|
||||
|
||||
imagx = _mm_srli_si128 (x, 1);
|
||||
imagx = _mm_and_si128 (imagx, mult1);
|
||||
realx = _mm_and_si128 (x, mult1);
|
||||
imagx = _mm_srli_si128(x, 1);
|
||||
imagx = _mm_and_si128(imagx, mult1);
|
||||
realx = _mm_and_si128(x, mult1);
|
||||
|
||||
imagy = _mm_srli_si128 (y, 1);
|
||||
imagy = _mm_and_si128 (imagy, mult1);
|
||||
realy = _mm_and_si128 (y, mult1);
|
||||
imagy = _mm_srli_si128(y, 1);
|
||||
imagy = _mm_and_si128(imagy, mult1);
|
||||
realy = _mm_and_si128(y, mult1);
|
||||
|
||||
realx_mult_realy = _mm_mullo_epi16 (realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16 (imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16 (realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16 (imagx, realy);
|
||||
realx_mult_realy = _mm_mullo_epi16(realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16(imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16(realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16(imagx, realy);
|
||||
|
||||
realc = _mm_sub_epi16 (realx_mult_realy, imagx_mult_imagy);
|
||||
imagc = _mm_add_epi16 (realx_mult_imagy, imagx_mult_realy);
|
||||
realc = _mm_sub_epi16(realx_mult_realy, imagx_mult_imagy);
|
||||
imagc = _mm_add_epi16(realx_mult_imagy, imagx_mult_realy);
|
||||
|
||||
realcacc = _mm_add_epi16 (realcacc, realc);
|
||||
imagcacc = _mm_add_epi16 (imagcacc, imagc);
|
||||
realcacc = _mm_add_epi16(realcacc, realc);
|
||||
imagcacc = _mm_add_epi16(imagcacc, imagc);
|
||||
|
||||
a += 8;
|
||||
b += 8;
|
||||
}
|
||||
|
||||
realcacc = _mm_and_si128 (realcacc, mult1);
|
||||
imagcacc = _mm_and_si128 (imagcacc, mult1);
|
||||
imagcacc = _mm_slli_si128 (imagcacc, 1);
|
||||
realcacc = _mm_and_si128(realcacc, mult1);
|
||||
imagcacc = _mm_and_si128(imagcacc, mult1);
|
||||
imagcacc = _mm_slli_si128(imagcacc, 1);
|
||||
|
||||
totalc = _mm_or_si128 (realcacc, imagcacc);
|
||||
totalc = _mm_or_si128(realcacc, imagcacc);
|
||||
|
||||
__VOLK_ATTR_ALIGNED(16) lv_8sc_t dotProductVector[8];
|
||||
|
||||
_mm_store_si128((__m128i*)dotProductVector,totalc); // Store the results back into the dot product vector
|
||||
_mm_store_si128((__m128i*)dotProductVector, totalc); // Store the results back into the dot product vector
|
||||
|
||||
for (unsigned int i = 0; i<8; ++i)
|
||||
for (unsigned int i = 0; i < 8; ++i)
|
||||
{
|
||||
dotProduct += dotProductVector[i];
|
||||
}
|
||||
@ -340,24 +337,17 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_sse2(lv_8sc_t* result, con
|
||||
#ifdef LV_HAVE_SSE4_1
|
||||
#include <smmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and accumulates them, storing the result.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] input One of the vectors to be multiplied
|
||||
\param[in] taps One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in input and taps to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_sse4_1(lv_8sc_t* result, const lv_8sc_t* input, const lv_8sc_t* taps, unsigned int num_points)
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_sse4_1(lv_8sc_t* result, const lv_8sc_t* in_a, const lv_8sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
lv_8sc_t dotProduct;
|
||||
memset(&dotProduct, 0x0, 2*sizeof(char));
|
||||
|
||||
const lv_8sc_t* a = input;
|
||||
const lv_8sc_t* b = taps;
|
||||
const lv_8sc_t* a = in_a;
|
||||
const lv_8sc_t* b = in_b;
|
||||
|
||||
const unsigned int sse_iters = num_points/8;
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
|
||||
if (sse_iters>0)
|
||||
if (sse_iters > 0)
|
||||
{
|
||||
__m128i x, y, mult1, realx, imagx, realy, imagy, realx_mult_realy, imagx_mult_imagy, realx_mult_imagy, imagx_mult_realy, realc, imagc, totalc, realcacc, imagcacc;
|
||||
|
||||
@ -370,24 +360,24 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_sse4_1(lv_8sc_t* result, c
|
||||
x = _mm_load_si128((__m128i*)a);
|
||||
y = _mm_load_si128((__m128i*)b);
|
||||
|
||||
imagx = _mm_srli_si128 (x, 1);
|
||||
imagx = _mm_and_si128 (imagx, mult1);
|
||||
realx = _mm_and_si128 (x, mult1);
|
||||
imagx = _mm_srli_si128(x, 1);
|
||||
imagx = _mm_and_si128(imagx, mult1);
|
||||
realx = _mm_and_si128(x, mult1);
|
||||
|
||||
imagy = _mm_srli_si128 (y, 1);
|
||||
imagy = _mm_and_si128 (imagy, mult1);
|
||||
realy = _mm_and_si128 (y, mult1);
|
||||
imagy = _mm_srli_si128(y, 1);
|
||||
imagy = _mm_and_si128(imagy, mult1);
|
||||
realy = _mm_and_si128(y, mult1);
|
||||
|
||||
realx_mult_realy = _mm_mullo_epi16 (realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16 (imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16 (realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16 (imagx, realy);
|
||||
realx_mult_realy = _mm_mullo_epi16(realx, realy);
|
||||
imagx_mult_imagy = _mm_mullo_epi16(imagx, imagy);
|
||||
realx_mult_imagy = _mm_mullo_epi16(realx, imagy);
|
||||
imagx_mult_realy = _mm_mullo_epi16(imagx, realy);
|
||||
|
||||
realc = _mm_sub_epi16 (realx_mult_realy, imagx_mult_imagy);
|
||||
imagc = _mm_add_epi16 (realx_mult_imagy, imagx_mult_realy);
|
||||
realc = _mm_sub_epi16(realx_mult_realy, imagx_mult_imagy);
|
||||
imagc = _mm_add_epi16(realx_mult_imagy, imagx_mult_realy);
|
||||
|
||||
realcacc = _mm_add_epi16 (realcacc, realc);
|
||||
imagcacc = _mm_add_epi16 (imagcacc, imagc);
|
||||
realcacc = _mm_add_epi16(realcacc, realc);
|
||||
imagcacc = _mm_add_epi16(imagcacc, imagc);
|
||||
|
||||
a += 8;
|
||||
b += 8;
|
||||
@ -399,9 +389,9 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_sse4_1(lv_8sc_t* result, c
|
||||
|
||||
__VOLK_ATTR_ALIGNED(16) lv_8sc_t dotProductVector[8];
|
||||
|
||||
_mm_store_si128((__m128i*)dotProductVector,totalc); // Store the results back into the dot product vector
|
||||
_mm_store_si128((__m128i*)dotProductVector, totalc); // Store the results back into the dot product vector
|
||||
|
||||
for (unsigned int i = 0; i<8; ++i)
|
||||
for (unsigned int i = 0; i < 8; ++i)
|
||||
{
|
||||
dotProduct += dotProductVector[i];
|
||||
}
|
||||
@ -417,17 +407,11 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_sse4_1(lv_8sc_t* result, c
|
||||
|
||||
#endif /*LV_HAVE_SSE4_1*/
|
||||
|
||||
|
||||
#ifdef LV_HAVE_ORC
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and accumulates them, storing the result.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] input One of the vectors to be multiplied
|
||||
\param[in] taps One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in input and taps to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
extern void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_orc_impl(short* resRealShort, short* resImagShort, const lv_8sc_t* input, const lv_8sc_t* taps, unsigned int num_points);
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_orc(lv_8sc_t* result, const lv_8sc_t* input, const lv_8sc_t* taps, unsigned int num_points)
|
||||
extern void volk_gnsssdr_8ic_x2_dot_prod_8ic_a_orc_impl(short* resRealShort, short* resImagShort, const lv_8sc_t* in_a, const lv_8sc_t* in_b, unsigned int num_points);
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_orc(lv_8sc_t* result, const lv_8sc_t* in_a, const lv_8sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
short resReal = 0;
|
||||
char* resRealChar = (char*)&resReal;
|
||||
@ -437,7 +421,7 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_orc(lv_8sc_t* result, cons
|
||||
char* resImagChar = (char*)&resImag;
|
||||
resImagChar++;
|
||||
|
||||
volk_gnsssdr_8ic_x2_dot_prod_8ic_a_orc_impl(&resReal, &resImag, input, taps, num_points);
|
||||
volk_gnsssdr_8ic_x2_dot_prod_8ic_a_orc_impl(&resReal, &resImag, in_a, in_b, num_points);
|
||||
|
||||
*result = lv_cmake(*resRealChar, *resImagChar);
|
||||
}
|
||||
@ -447,21 +431,14 @@ static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_u_orc(lv_8sc_t* result, cons
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and accumulates them, storing the result.
|
||||
\param[out] result Value of the accumulated result
|
||||
\param[in] input One of the vectors to be multiplied
|
||||
\param[in] taps One of the vectors to be multiplied
|
||||
\param[in] num_points The number of complex values in input and taps to be multiplied together, accumulated and stored into result
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_neon(lv_8sc_t* result, const lv_8sc_t* input, const lv_8sc_t* taps, unsigned int num_points)
|
||||
static inline void volk_gnsssdr_8ic_x2_dot_prod_8ic_neon(lv_8sc_t* result, const lv_8sc_t* in_a, const lv_8sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
lv_8sc_t dotProduct;
|
||||
dotProduct = lv_cmake(0,0);
|
||||
*result = lv_cmake(0,0);
|
||||
|
||||
const lv_8sc_t* a = input;
|
||||
const lv_8sc_t* b = taps;
|
||||
const lv_8sc_t* a = in_a;
|
||||
const lv_8sc_t* b = in_b;
|
||||
// for 2-lane vectors, 1st lane holds the real part,
|
||||
// 2nd lane holds the imaginary part
|
||||
int8x8x2_t a_val, b_val, c_val, accumulator, tmp_real, tmp_imag;
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8ic_x2_multiply_8ic.h
|
||||
* \brief Volk protokernel: multiplies two 16 bits vectors
|
||||
* \brief VOLK_GNSSSDR kernel: multiplies two 16 bits vectors.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* VOLK_GNSSSDR kernel that multiplies two 16 bits vectors (8 bits the real part
|
||||
* and 8 bits the imaginary part)
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
@ -33,6 +33,28 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8ic_x2_multiply_8ic
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Multiplies two input complex vectors, point-by-point, storing the result in the third vector
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_8ic_x2_multiply_8ic(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t* bVector, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li aVector: One of the vectors to be multiplied
|
||||
* \li bVector: The other vector to be multiplied
|
||||
* \li num_points: The number of complex data points.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li cVector: The vector where the result will be stored
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8ic_x2_multiply_8ic_H
|
||||
#define INCLUDED_volk_gnsssdr_8ic_x2_multiply_8ic_H
|
||||
|
||||
@ -41,13 +63,6 @@
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and stores the results in the third vector
|
||||
\param[out] cVector The vector where the results will be stored
|
||||
\param[in] aVector One of the vectors to be multiplied
|
||||
\param[in] bVector One of the vectors to be multiplied
|
||||
\param{in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_multiply_8ic_u_sse2(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t* bVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -99,16 +114,10 @@ static inline void volk_gnsssdr_8ic_x2_multiply_8ic_u_sse2(lv_8sc_t* cVector, co
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE4_1
|
||||
#include <smmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and stores the results in the third vector
|
||||
\param[out] cVector The vector where the results will be stored
|
||||
\param[in] aVector One of the vectors to be multiplied
|
||||
\param[in] bVector One of the vectors to be multiplied
|
||||
\param{in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_multiply_8ic_u_sse4_1(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t* bVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -160,15 +169,9 @@ static inline void volk_gnsssdr_8ic_x2_multiply_8ic_u_sse4_1(lv_8sc_t* cVector,
|
||||
}
|
||||
#endif /* LV_HAVE_SSE4_1 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and stores the results in the third vector
|
||||
\param[out] cVector The vector where the results will be stored
|
||||
\param[in] aVector One of the vectors to be multiplied
|
||||
\param[in] bVector One of the vectors to be multiplied
|
||||
\param{in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_multiply_8ic_generic(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t* bVector, unsigned int num_points)
|
||||
{
|
||||
lv_8sc_t* cPtr = cVector;
|
||||
@ -186,13 +189,6 @@ static inline void volk_gnsssdr_8ic_x2_multiply_8ic_generic(lv_8sc_t* cVector, c
|
||||
#ifdef LV_HAVE_SSE2
|
||||
#include <emmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and stores the results in the third vector
|
||||
\param[out] cVector The vector where the results will be stored
|
||||
\param[in] aVector One of the vectors to be multiplied
|
||||
\param[in] bVector One of the vectors to be multiplied
|
||||
\param{in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_multiply_8ic_a_sse2(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t* bVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -244,16 +240,10 @@ static inline void volk_gnsssdr_8ic_x2_multiply_8ic_a_sse2(lv_8sc_t* cVector, co
|
||||
}
|
||||
#endif /* LV_HAVE_SSE2 */
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE4_1
|
||||
#include <smmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and stores the results in the third vector
|
||||
\param[out] cVector The vector where the results will be stored
|
||||
\param[in] aVector One of the vectors to be multiplied
|
||||
\param[in] bVector One of the vectors to be multiplied
|
||||
\param{in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_multiply_8ic_a_sse4_1(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t* bVector, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 8;
|
||||
@ -310,13 +300,6 @@ static inline void volk_gnsssdr_8ic_x2_multiply_8ic_a_sse4_1(lv_8sc_t* cVector,
|
||||
|
||||
extern void volk_gnsssdr_8ic_x2_multiply_8ic_a_orc_impl(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t* bVector, unsigned int num_points);
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input complex vectors of 8-bit integer each component and stores the results in the third vector
|
||||
\param[out] cVector The vector where the results will be stored
|
||||
\param[in] aVector One of the vectors to be multiplied
|
||||
\param[in] bVector One of the vectors to be multiplied
|
||||
\param{in] num_points The number of complex values in aVector and bVector to be multiplied together and stored into cVector
|
||||
*/
|
||||
static inline void volk_gnsssdr_8ic_x2_multiply_8ic_u_orc(lv_8sc_t* cVector, const lv_8sc_t* aVector, const lv_8sc_t* bVector, unsigned int num_points)
|
||||
{
|
||||
volk_gnsssdr_8ic_x2_multiply_8ic_a_orc_impl(cVector, aVector, bVector, num_points);
|
||||
|
@ -1,11 +1,11 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_8u_x2_multiply_8u.h
|
||||
* \brief Volk protokernel: multiplies unsigned char values
|
||||
* \brief VOLK_GNSSSDR kernel: multiplies unsigned char values.
|
||||
* \authors <ul>
|
||||
* <li> Andres Cecilia, 2014. a.cecilia.luque(at)gmail.com
|
||||
* </ul>
|
||||
*
|
||||
* Volk protokernel that multiplies unsigned char values (8 bits data)
|
||||
* VOLK_GNSSSDR kernel that multiplies unsigned char values (8 bits data)
|
||||
*
|
||||
* -------------------------------------------------------------------------
|
||||
*
|
||||
@ -32,19 +32,36 @@
|
||||
* -------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/*!
|
||||
* \page volk_gnsssdr_8u_x2_multiply_8u
|
||||
*
|
||||
* \b Overview
|
||||
*
|
||||
* Multiplies two input vectors of unsigned char, point-by-point, storing the result in the third vector
|
||||
*
|
||||
* <b>Dispatcher Prototype</b>
|
||||
* \code
|
||||
* void volk_gnsssdr_8u_x2_multiply_8u(unsigned char* cChar, const unsigned char* aChar, const unsigned char* bChar, unsigned int num_points);
|
||||
* \endcode
|
||||
*
|
||||
* \b Inputs
|
||||
* \li aChar: One of the vectors to be multiplied
|
||||
* \li bChar: The other vector to be multiplied
|
||||
* \li num_points: The number of complex data points.
|
||||
*
|
||||
* \b Outputs
|
||||
* \li cChar: The vector where the result will be stored
|
||||
*
|
||||
*/
|
||||
|
||||
|
||||
#ifndef INCLUDED_volk_gnsssdr_8u_x2_multiply_8u_H
|
||||
#define INCLUDED_volk_gnsssdr_8u_x2_multiply_8u_H
|
||||
|
||||
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
/*!
|
||||
\brief Multiplies the two input unsigned char values and stores their results in the third unsigned char
|
||||
\param cChar The unsigned char where the results will be stored
|
||||
\param aChar One of the unsigned char to be multiplied
|
||||
\param bChar One of the unsigned char to be multiplied
|
||||
\param num_points The number of unsigned char values in aChar and bChar to be multiplied together and stored into cChar
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8u_x2_multiply_8u_u_sse3(unsigned char* cChar, const unsigned char* aChar, const unsigned char* bChar, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 16;
|
||||
@ -60,21 +77,21 @@ static inline void volk_gnsssdr_8u_x2_multiply_8u_u_sse3(unsigned char* cChar, c
|
||||
y = _mm_lddqu_si128((__m128i*)b);
|
||||
|
||||
mult1 = _mm_set_epi8(0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255);
|
||||
x1 = _mm_srli_si128 (x, 1);
|
||||
x1 = _mm_and_si128 (x1, mult1);
|
||||
x2 = _mm_and_si128 (x, mult1);
|
||||
x1 = _mm_srli_si128(x, 1);
|
||||
x1 = _mm_and_si128(x1, mult1);
|
||||
x2 = _mm_and_si128(x, mult1);
|
||||
|
||||
y1 = _mm_srli_si128 (y, 1);
|
||||
y1 = _mm_and_si128 (y1, mult1);
|
||||
y2 = _mm_and_si128 (y, mult1);
|
||||
y1 = _mm_srli_si128(y, 1);
|
||||
y1 = _mm_and_si128(y1, mult1);
|
||||
y2 = _mm_and_si128(y, mult1);
|
||||
|
||||
x1_mult_y1 = _mm_mullo_epi16 (x1, y1);
|
||||
x2_mult_y2 = _mm_mullo_epi16 (x2, y2);
|
||||
x1_mult_y1 = _mm_mullo_epi16(x1, y1);
|
||||
x2_mult_y2 = _mm_mullo_epi16(x2, y2);
|
||||
|
||||
tmp = _mm_and_si128 (x1_mult_y1, mult1);
|
||||
tmp1 = _mm_slli_si128 (tmp, 1);
|
||||
tmp2 = _mm_and_si128 (x2_mult_y2, mult1);
|
||||
totalc = _mm_or_si128 (tmp1, tmp2);
|
||||
tmp = _mm_and_si128(x1_mult_y1, mult1);
|
||||
tmp1 = _mm_slli_si128(tmp, 1);
|
||||
tmp2 = _mm_and_si128(x2_mult_y2, mult1);
|
||||
totalc = _mm_or_si128(tmp1, tmp2);
|
||||
|
||||
_mm_storeu_si128((__m128i*)c, totalc);
|
||||
|
||||
@ -91,13 +108,7 @@ static inline void volk_gnsssdr_8u_x2_multiply_8u_u_sse3(unsigned char* cChar, c
|
||||
#endif /* LV_HAVE_SSE3 */
|
||||
|
||||
#ifdef LV_HAVE_GENERIC
|
||||
/*!
|
||||
\brief Multiplies the two input unsigned char values and stores their results in the third unisgned char
|
||||
\param cChar The unsigned char where the results will be stored
|
||||
\param aChar One of the unsigned char to be multiplied
|
||||
\param bChar One of the unsigned char to be multiplied
|
||||
\param num_points The number of unsigned char values in aChar and bChar to be multiplied together and stored into cChar
|
||||
*/
|
||||
|
||||
static inline void volk_gnsssdr_8u_x2_multiply_8u_generic(unsigned char* cChar, const unsigned char* aChar, const unsigned char* bChar, unsigned int num_points)
|
||||
{
|
||||
unsigned char* cPtr = cChar;
|
||||
@ -115,13 +126,6 @@ static inline void volk_gnsssdr_8u_x2_multiply_8u_generic(unsigned char* cChar,
|
||||
#ifdef LV_HAVE_SSE3
|
||||
#include <pmmintrin.h>
|
||||
|
||||
/*!
|
||||
\brief Multiplies the two input unsigned char values and stores their results in the third unisgned char
|
||||
\param cChar The unsigned char where the results will be stored
|
||||
\param aChar One of the unsigned char to be multiplied
|
||||
\param bChar One of the unsigned char to be multiplied
|
||||
\param num_points The number of unsigned char values in aChar and bChar to be multiplied together and stored into cChar
|
||||
*/
|
||||
static inline void volk_gnsssdr_8u_x2_multiply_8u_a_sse3(unsigned char* cChar, const unsigned char* aChar, const unsigned char* bChar, unsigned int num_points)
|
||||
{
|
||||
const unsigned int sse_iters = num_points / 16;
|
||||
@ -137,21 +141,21 @@ static inline void volk_gnsssdr_8u_x2_multiply_8u_a_sse3(unsigned char* cChar, c
|
||||
y = _mm_load_si128((__m128i*)b);
|
||||
|
||||
mult1 = _mm_set_epi8(0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255);
|
||||
x1 = _mm_srli_si128 (x, 1);
|
||||
x1 = _mm_and_si128 (x1, mult1);
|
||||
x2 = _mm_and_si128 (x, mult1);
|
||||
x1 = _mm_srli_si128(x, 1);
|
||||
x1 = _mm_and_si128(x1, mult1);
|
||||
x2 = _mm_and_si128(x, mult1);
|
||||
|
||||
y1 = _mm_srli_si128 (y, 1);
|
||||
y1 = _mm_and_si128 (y1, mult1);
|
||||
y2 = _mm_and_si128 (y, mult1);
|
||||
y1 = _mm_srli_si128(y, 1);
|
||||
y1 = _mm_and_si128(y1, mult1);
|
||||
y2 = _mm_and_si128(y, mult1);
|
||||
|
||||
x1_mult_y1 = _mm_mullo_epi16 (x1, y1);
|
||||
x2_mult_y2 = _mm_mullo_epi16 (x2, y2);
|
||||
x1_mult_y1 = _mm_mullo_epi16(x1, y1);
|
||||
x2_mult_y2 = _mm_mullo_epi16(x2, y2);
|
||||
|
||||
tmp = _mm_and_si128 (x1_mult_y1, mult1);
|
||||
tmp1 = _mm_slli_si128 (tmp, 1);
|
||||
tmp2 = _mm_and_si128 (x2_mult_y2, mult1);
|
||||
totalc = _mm_or_si128 (tmp1, tmp2);
|
||||
tmp = _mm_and_si128(x1_mult_y1, mult1);
|
||||
tmp1 = _mm_slli_si128(tmp, 1);
|
||||
tmp2 = _mm_and_si128(x2_mult_y2, mult1);
|
||||
totalc = _mm_or_si128(tmp1, tmp2);
|
||||
|
||||
_mm_store_si128((__m128i*)c, totalc);
|
||||
|
||||
@ -169,13 +173,7 @@ static inline void volk_gnsssdr_8u_x2_multiply_8u_a_sse3(unsigned char* cChar, c
|
||||
|
||||
|
||||
#ifdef LV_HAVE_ORC
|
||||
/*!
|
||||
\brief Multiplies the two input unsigned char values and stores their results in the third unisgned char
|
||||
\param cChar The unsigned char where the results will be stored
|
||||
\param aChar One of the unsigned char to be multiplied
|
||||
\param bChar One of the unsigned char to be multiplied
|
||||
\param num_points The number of unsigned char values in aChar and bChar to be multiplied together and stored into cChar
|
||||
*/
|
||||
|
||||
extern void volk_gnsssdr_8u_x2_multiply_8u_a_orc_impl(unsigned char* cVector, const unsigned char* aVector, const unsigned char* bVector, unsigned int num_points);
|
||||
static inline void volk_gnsssdr_8u_x2_multiply_8u_u_orc(unsigned char* cVector, const unsigned char* aVector, const unsigned char* bVector, unsigned int num_points)
|
||||
{
|
||||
|
Loading…
Reference in New Issue
Block a user