mirror of
https://github.com/gnss-sdr/gnss-sdr
synced 2024-12-14 20:20:35 +00:00
Adding documentation
Copied from VOLK, with some minor changes
This commit is contained in:
parent
81f4eadb5b
commit
f6e713929a
@ -0,0 +1,92 @@
|
||||
/*! \page extending_volk Extending VOLK
|
||||
|
||||
There are two primary routes for extending VOLK for your own use. The
|
||||
preferred route is by writing kernels and proto-kernels as part of this
|
||||
repository and sending patches upstream. The alternative is creating your
|
||||
own VOLK module, as it is the case of VOLK_GNSSSDR ;-). There is a good reason
|
||||
for that: to provide GNSS-SDR users with adequate protokernels as soon as possible,
|
||||
without needing to upgrade to the latest VOLK version to enjoy the benefits.
|
||||
Notwithstanding, some of VOLK_GNSSSDR can be integrated into VOLK in the future,
|
||||
if other users find them useful.
|
||||
|
||||
## Modifying this repository
|
||||
|
||||
### Adding kernels
|
||||
|
||||
Adding kernels refers to introducing a new function to the VOLK API that is
|
||||
presumably a useful math function/operation. The first step is to create
|
||||
the file in volk/kernels/volk. Follow the naming scheme provided in the
|
||||
VOLK terms and techniques page. First create the generic protokernel.
|
||||
|
||||
The generic protokernel should be written in plain C using explicitly sized
|
||||
types from stdint.h or volk_complex.h when appropriate. volk_complex.h
|
||||
includes explicitly sized complex types for floats and ints. The name of
|
||||
the generic kernel should be volk_signature_from_file_generic. If multiple
|
||||
versions of the generic kernel exist then a description can be appended to
|
||||
generic_, but it is not required to use alignment flags in the generic
|
||||
protokernel name. It is required to surround the entire generic function
|
||||
with preprocessor ifdef fences on the symbol LV_HAVE_GENERIC.
|
||||
|
||||
Finally, add the kernel to the list of test cases in volk/lib/kernel_tests.h.
|
||||
Many kernels should be able to use the default test parameters, but if yours
|
||||
requires a lower tolerance, specific vector length, or other test parameters
|
||||
just create a new instance of volk_test_params_t for your kernel.
|
||||
|
||||
### Adding protokernels
|
||||
|
||||
The primary purpose of VOLK is to have multiple implementations of an operation
|
||||
tuned for a specific CPU architecture. Ideally there is at least one
|
||||
protokernel of each kernel for every architecture that VOLK supports.
|
||||
The pattern for protokernel naming is volk_kernel_signature_architecture_nick.
|
||||
The architecture should be one of the supported VOLK architectures. The nick is
|
||||
an optional name to distinguish between multiple implementations for a
|
||||
particular architecture.
|
||||
|
||||
Architecture specific protokernels can be written in one of three ways.
|
||||
The first approach should always be to use compiler intrinsic functions.
|
||||
The second and third approaches are using either in-line assembly or
|
||||
assembly with .S files. Both methods of writing assembly exist in VOLK and
|
||||
should yield equivalent performance; which method you might choose is a
|
||||
matter of opinion. Regardless of the actual method the public function should
|
||||
be declared in the kernel header surrounded by ifdef fences on the symbol that
|
||||
fits the architecture implementation.
|
||||
|
||||
#### Compiler Intrinsics
|
||||
|
||||
Compiler intrinsics should be treated as functions that map to a specific
|
||||
assembly instruction. Most VOLK kernels take the form of a loop that iterates
|
||||
through a vector. Form a loop that iterates on a number of items that is natural
|
||||
for the architecture and then use compiler intrinsics to do the math for your
|
||||
operation or algorithm. Include the appropriate header inside the ifdef fences,
|
||||
but before your protokernel declaration.
|
||||
|
||||
|
||||
#### In-line Assembly
|
||||
|
||||
In-line assembly uses a compiler macro to include verbatim assembly with C code.
|
||||
The process of in-line assembly protokernels is very similar to protokernels
|
||||
based on intrinsics.
|
||||
|
||||
#### Assembly with .S files
|
||||
|
||||
To write pure assembly protokernels, first declare the function name in the
|
||||
kernel header file the same way as any other protokernel, but include the extern
|
||||
keyword. Second, create a file (one for each protokernel) in
|
||||
volk/kernels/volk/asm/$arch. Disassemble another protokernel and copy the
|
||||
disassembled code in to this file to bootstrap a working implementation. Often
|
||||
the disassembled code can be hand-tuned to improve performance.
|
||||
|
||||
## VOLK Modules
|
||||
|
||||
VOLK has a concept of modules. Each module is an independent VOLK tree. Modules
|
||||
can be managed with the volk_modtool application. At a high level the module is
|
||||
a clone of all of the VOLK machinery without kernels. volk_modtool also makes it
|
||||
easy to copy kernels to a module.
|
||||
|
||||
Kernels and protokernels are added to your own VOLK module the same way they are
|
||||
added to this repository, which was described in the previous section.
|
||||
|
||||
VOLK_GNSSSDR is a VOLK Module.
|
||||
|
||||
*/
|
||||
|
@ -0,0 +1,24 @@
|
||||
/*! \page kernels Kernels
|
||||
|
||||
\li \subpage volk_gnsssdr_32fc_convert_16ic
|
||||
\li \subpage volk_gnsssdr_32fc_convert_8ic
|
||||
\li \subpage volk_gnsssdr_16ic_convert_32fc
|
||||
\li \subpage volk_gnsssdr_16ic_resampler_16ic
|
||||
\li \subpage volk_gnsssdr_16ic_xn_resampler_16ic_xn
|
||||
\li \subpage volk_gnsssdr_16ic_s32fc_x2_rotator_16ic
|
||||
\li \subpage volk_gnsssdr_16ic_x2_multiply_16ic
|
||||
\li \subpage volk_gnsssdr_16ic_x2_dot_prod_16ic
|
||||
\li \subpage volk_gnsssdr_16ic_x2_dot_prod_16ic_xn
|
||||
\li \subpage volk_gnsssdr_16ic_x2_rotator_dot_prod_16ic_xn
|
||||
\li \subpage volk_gnsssdr_8i_accumulator_s8i
|
||||
\li \subpage volk_gnsssdr_8i_index_max_16u
|
||||
\li \subpage volk_gnsssdr_8i_max_s8i
|
||||
\li \subpage volk_gnsssdr_8i_x2_add_8i
|
||||
\li \subpage volk_gnsssdr_8ic_conjugate_8ic
|
||||
\li \subpage volk_gnsssdr_8ic_magnitude_squared_8i
|
||||
\li \subpage volk_gnsssdr_8ic_x2_dot_prod_8ic
|
||||
\li \subpage volk_gnsssdr_8ic_x2_multiply_8ic
|
||||
\li \subpage volk_gnsssdr_8ic_s8ic_multiply_8ic
|
||||
\li \subpage volk_gnsssdr_64f_accumulator_64f
|
||||
|
||||
*/
|
@ -0,0 +1,19 @@
|
||||
/*! \mainpage VOLK_GNSSSDR
|
||||
|
||||
Welcome to VOLK_GNSSSDR!
|
||||
|
||||
VOLK_GNSSSDR is the Vector-Optimized Library of Kernels for GNSS-SDR.
|
||||
It is a library that contains kernels of hand-written SIMD code for different
|
||||
mathematical operations. Since each SIMD architecture can be very different
|
||||
and no compiler has yet come along to handle vectorization properly or highly
|
||||
efficiently, VOLK_GNSSSDR approaches the problem differently.
|
||||
|
||||
For each architecture or platform that a developer wishes to vectorize for, a
|
||||
new proto-kernel is added to VOLK_GNSSSDR. At runtime, VOLK_GNSSSDR will select the correct
|
||||
proto-kernel. In this way, the users of VOLK_GNSSSDR call a kernel for performing the
|
||||
operation that is platform/architecture agnostic. This allows us to write
|
||||
portable SIMD code.
|
||||
|
||||
VOLK_GNSSSDR is a module generated from the original VOLK library http://libvolk.org
|
||||
|
||||
*/
|
@ -0,0 +1,121 @@
|
||||
/*! \page concepts_terms_and_techniques Concepts, Terms, and Techniques
|
||||
|
||||
This page is primarily a list of definitions and brief overview of successful
|
||||
techniques previously used to develop VOLK_GNSSSDR protokernels.
|
||||
|
||||
## Definitions and Concepts
|
||||
|
||||
### SIMD
|
||||
|
||||
SIMD stands for Single Instruction Multiple Data. Leveraging SIMD instructions
|
||||
is the primary optimization in VOLK_GNSSSDR.
|
||||
|
||||
### Architecture
|
||||
|
||||
A VOLK_GNSSSDR architecture is normally called an Instruction Set Architecture (ISA).
|
||||
The architectures we target in VOLK_GNSSSDR usually have SIMD instructions.
|
||||
|
||||
### Vector
|
||||
|
||||
A vector in VOLK_GNSSSDR is the same as a C array. It sometimes, but not always
|
||||
coincides with the mathematical definition of a vector.
|
||||
|
||||
### Kernel
|
||||
|
||||
The 'kernel' part of the VOLK_GNSSSDR name comes from the high performance computing
|
||||
use of the word. In this context it is the inner loop of a vector operation.
|
||||
Since we don't use the word vector in the math sense a vector operation is an
|
||||
operation that is performed on a C array.
|
||||
|
||||
### Protokernel
|
||||
|
||||
A protokernel is an implementation of a kernel. Every kernel has a 'generic'
|
||||
protokernel that is implemented in C. Other protokernels are optimized for a
|
||||
particular architecture.
|
||||
|
||||
|
||||
## Techniques
|
||||
|
||||
### New Kernels
|
||||
|
||||
Add new kernels to the list in lib/kernel_tests.h. This adds the kernel to
|
||||
VOLK_GNSSSDR's QA tool as well as the volk profiler. Many kernels are able to
|
||||
share test parameters, but new kernels might need new ones.
|
||||
|
||||
If the VOLK_GNSSSDR kernel does not 'fit' the the standard set of function parameters
|
||||
expected by the volk_profile structure, you need to create a VOLK_GNSSSDR puppet
|
||||
function to help the profiler call the kernel. This is essentially due to the
|
||||
function run_volk_gnsssdr_tests which has a limited number of function prototypes that
|
||||
it can test.
|
||||
|
||||
### Protokernels
|
||||
|
||||
Adding new proto-kernels (implementations of VOLK_GNSSSDR kernels for specific
|
||||
architectures) is relatively easy. In the relevant <kernel>.h file in
|
||||
the volk_gnsssdr/include/volk_gnsssdr/volk_gnsssdr<input-fingerprint_function-name_output-fingerprint>.h
|
||||
file, add a new #ifdef/#endif block for the LV_HAVE_<arch> corresponding
|
||||
to the <arch> you a working on (e.g. SSE, AVX, NEON, etc.).
|
||||
|
||||
For example, for volk_gnsssdr_16ic_x2_multiply_16ic_neon:
|
||||
|
||||
\code
|
||||
#ifdef LV_HAVE_NEON
|
||||
#include <arm_neon.h>
|
||||
|
||||
static inline void volk_gnsssdr_16ic_x2_multiply_16ic_neon(lv_16sc_t* out, const lv_16sc_t* in_a, const lv_16sc_t* in_b, unsigned int num_points)
|
||||
{
|
||||
lv_16sc_t *a_ptr = (lv_16sc_t*) in_a;
|
||||
lv_16sc_t *b_ptr = (lv_16sc_t*) in_b;
|
||||
unsigned int quarter_points = num_points / 4;
|
||||
int16x4x2_t a_val, b_val, c_val;
|
||||
int16x4x2_t tmp_real, tmp_imag;
|
||||
unsigned int number = 0;
|
||||
|
||||
for(; number < quarter_points; ++number)
|
||||
{
|
||||
a_val = vld2_s16((int16_t*)a_ptr);
|
||||
b_val = vld2_s16((int16_t*)b_ptr);
|
||||
|
||||
tmp_real.val[0] = vmul_s16(a_val.val[0], b_val.val[0]);
|
||||
tmp_real.val[1] = vmul_s16(a_val.val[1], b_val.val[1]);
|
||||
|
||||
tmp_imag.val[0] = vmul_s16(a_val.val[0], b_val.val[1]);
|
||||
tmp_imag.val[1] = vmul_s16(a_val.val[1], b_val.val[0]);
|
||||
|
||||
c_val.val[0] = vsub_s16(tmp_real.val[0], tmp_real.val[1]);
|
||||
c_val.val[1] = vadd_s16(tmp_imag.val[0], tmp_imag.val[1]);
|
||||
vst2_s16((int16_t*)out, c_val);
|
||||
|
||||
a_ptr += 4;
|
||||
b_ptr += 4;
|
||||
out += 4;
|
||||
}
|
||||
|
||||
for(number = quarter_points * 4; number < num_points; number++)
|
||||
{
|
||||
*out++ = (*a_ptr++) * (*b_ptr++);
|
||||
}
|
||||
}
|
||||
#endif /* LV_HAVE_NEON */
|
||||
\endcode
|
||||
|
||||
### Allocating Memory
|
||||
|
||||
SIMD code can be very sensitive to the alignment of the vectors, which is
|
||||
generally something like a 16-byte or 32-byte alignment requirement. The
|
||||
VOLK_GNSSSDR dispatcher functions, which is what we will normally call as users of
|
||||
VOLK_GNSSSDR, makes sure that the correct aligned or unaligned version is called
|
||||
depending on the state of the vectors passed to it. However, things typically
|
||||
work faster and more efficiently when the vectors are aligned. As such, VOLK_GNSSSDR
|
||||
has memory allocate and free methods to provide us with properly aligned
|
||||
vectors. We can also ask VOLK_GNSSSDR to give us the current machine's alignment
|
||||
requirement, which makes our job even easier when porting code.
|
||||
|
||||
To get the machine's alignment, simply call the size_t volk_gnsssdr_get_alignment().
|
||||
|
||||
Allocate memory using void* volk_gnsssdr_malloc(size_t size, size_t alignment).
|
||||
|
||||
Make sure that any memory allocated by VOLK_GNSSSDR is also freed by VOLK_GNSSSDR with volk_gnsssdr_free(void *p).
|
||||
|
||||
|
||||
*/
|
@ -0,0 +1,19 @@
|
||||
/*! \page using_volk_gnsssdr Using VOLK_GNSSSDR
|
||||
|
||||
Using VOLK_GNSSSDR in your code requires proper linking and including the correct headers.
|
||||
VOLK_GNSSSDR currently supports both C and C++ bindings.
|
||||
|
||||
VOLK_GNSSSDR provides both a pkgconfig and CMake module to help configuration and
|
||||
linking. The pkfconfig file is installed to
|
||||
$install_prefix/lib/pkgconfig/volk_gnsssdr.pc. The CMake configuration module is in
|
||||
$install_prefix/lib/cmake/volk_gnsssdr/VolkConfig.cmake.
|
||||
|
||||
The header in the VOLK_GNSSSDR include directory (includedir in pkgconfig,
|
||||
VOLK_GNSSSDR_INCLUDE_DIRS in cmake module) contains the header volk_gnsssdr/volk_gnsssdr.h defines all
|
||||
of the symbols exposed by VOLK_GNSSSDR. Alternatively individual kernel headers are in
|
||||
the same location.
|
||||
|
||||
In most cases it is sufficient to call the dispatcher for the kernel you are using.
|
||||
|
||||
*/
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*!
|
||||
* \file volk_gnsssdr_32fc_convert_16ic.h
|
||||
* \file volk_gnsssdr_16ic_convert_32fc.h
|
||||
* \brief Volk protokernel: converts 16 bit integer complex complex values to 32 bits float complex values
|
||||
* \authors <ul>
|
||||
* <li> Javier Arribas, 2015. jarribas(at)cttc.es
|
||||
|
Loading…
Reference in New Issue
Block a user