Segfault with volk on 32 bit AMD

Hello,

Not sure if this is the right place to post this but I have had this
error with 3.5.2 and later (git and 3.5.2.1) on Slackware 13.37 on a
dual athlon 32 bit machine. The problem is not present on an Intel Atom
(32 bit dual core) machine with exactly the same configuration and
libraries installed. Here is the python error report from
gnuradio-companion:

Executing: “/home/fred/gnuradio/top_block.py”

Using Volk machine: generic
Traceback (most recent call last):
File “/home/fred/gnuradio/top_block.py”, line 169, in
tb = top_block()
File “/home/fred/gnuradio/top_block.py”, line 101, in init
self.gr_multiply_xx_3 = gr.multiply_vff(1)
File
“/usr/lib/python2.6/site-packages/gnuradio/gr/gnuradio_core_general.py”,
line 8783, in multiply_ff
return _gnuradio_core_general.multiply_ff(vlen)
RuntimeError: gr_block::set_alignment_multiple

When running volk_profile from the git sources, the application produces
a segfault when it reaches volk_32fc_x2_multiply_32fc_a

I did a trace using ddd and it traced the error back to libvolk.so.0.0.0

Cheers,

Fred

Also using the gdb “disassemble” command you could trace which
instruction
raised the fault.

Cheers,
Rafael D.

On Fri, Mar 16, 2012 at 11:38 AM, Frederick Stevens
[email protected]wrote:

Using Volk machine: generic

When running volk_profile from the git sources, the application produces a
segfault when it reaches volk_32fc_x2_multiply_32fc_a

I did a trace using ddd and it traced the error back to libvolk.so.0.0.0

Cheers,

Fred

Well, that doesn’t make me happy at all. Especially since we just
patched
the release yesterday…

Sounds like the AMD chip is handling something wrong (or differently)
with
the alignment. All vectors passed using volk_profile are supposed to be
properly byte aligned.

Fred, can you just check for me what happens when you run it under gdb?
Just run “gdb volk_profile” and when it crashes, do a “bt” and post the
results of the backtrace here.

Thanks,
Tom

Rafael,

Here is the output from gdb:

RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a

Program received signal SIGSEGV, Segmentation fault.
0xb7f2c6e0 in volk_32fc_32f_multiply_32fc_a_generic ()
from
/home/fred/extras/gnuradio/gnuradio/build/volk/lib/libvolk.so.0.0.0

Dump of assembler code for function
volk_32fc_32f_multiply_32fc_a_generic:
0xb7f2c6c0 <+0>: push %ebp
0xb7f2c6c1 <+1>: mov %esp,%ebp
0xb7f2c6c3 <+3>: push %edi
0xb7f2c6c4 <+4>: push %esi
0xb7f2c6c5 <+5>: mov 0x8(%ebp),%edx
0xb7f2c6c8 <+8>: mov 0xc(%ebp),%ecx
0xb7f2c6cb <+11>: mov 0x10(%ebp),%edi
0xb7f2c6ce <+14>: mov 0x14(%ebp),%esi
0xb7f2c6d1 <+17>: test %esi,%esi
0xb7f2c6d3 <+19>: je 0xb7f2c6fc
<volk_32fc_32f_multiply_32fc_a_generic+60>
0xb7f2c6d5 <+21>: xor %eax,%eax
0xb7f2c6d7 <+23>: mov %esi,%esi
0xb7f2c6d9 <+25>: lea 0x0(%edi,%eiz,1),%edi
=> 0xb7f2c6e0 <+32>: flds (%edi,%eax,8)
0xb7f2c6e3 <+35>: flds (%ecx,%eax,8)
0xb7f2c6e6 <+38>: fmul %st(1),%st
0xb7f2c6e8 <+40>: fxch %st(1)
0xb7f2c6ea <+42>: fmuls 0x4(%ecx,%eax,8)
0xb7f2c6ee <+46>: fxch %st(1)
0xb7f2c6f0 <+48>: fstps (%edx,%eax,8)
0xb7f2c6f3 <+51>: fstps 0x4(%edx,%eax,8)
0xb7f2c6f7 <+55>: inc %eax
0xb7f2c6f8 <+56>: cmp %eax,%esi
0xb7f2c6fa <+58>: ja 0xb7f2c6e0
<volk_32fc_32f_multiply_32fc_a_generic+32>
0xb7f2c6fc <+60>: pop %esi
0xb7f2c6fd <+61>: pop %edi
0xb7f2c6fe <+62>: pop %ebp
0xb7f2c6ff <+63>: ret
End of assembler dump.

I omitted the first part of the program execution since everything
seemed to be working fine. Hope this helps. Let me know if you would
like me to try something else.

Cheers,

Fred

On Fri, Mar 16, 2012 at 2:50 PM, Nick F. [email protected] wrote:

0xb7f2c6e0 in volk_32fc_32f_multiply_32fc_a_**generic ()
0xb7f2c6c4 <+4>: push %esi
=> 0xb7f2c6e0 <+32>: flds (%edi,%eax,8)
a_generic+32>
Cheers,

Fred

OK, that’s weird as hell. That’s the generic implementation, which is just
a std::complex multiply in a for loop. Can you give me your gcc version?

–n

Yep, that shouldn’t be a problem. Fred, can you also give us the
backtrace?

You might have to recompile volk with debugging turned on (passing
-DCMAKE_BUILD_TYPE=“Debug” to cmake).

Thanks,
Tom

Is your system a stock slack 13.37 32bits?

You might have to recompile volk with debugging turned on (passing
-DCMAKE_BUILD_TYPE=“Debug” to cmake).

regards,
Rafael D.

Here is the gcc info. It will take me a bit to re compile gnuradio but
I will turn on debugging in cmake.

gcc (GCC) 4.5.2
Copyright © 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

I am running kernel 3.2.7 from slackware-current and swig 2.0.4
everything else is from slackbuilds. I will have to get versions for
you if you need them.

Cheers,

Fred

On Fri, Mar 16, 2012 at 11:42 AM, Frederick Stevens
[email protected]wrote:

0xb7f2c6cb <+11>: mov 0x10(%ebp),%edi
0xb7f2c6e8 <+40>: fxch %st(1)
0xb7f2c6fe <+62>: pop %ebp
Fred
OK, that’s weird as hell. That’s the generic implementation, which is
just
a std::complex multiply in a for loop. Can you give me your gcc version?

–n

On Fri, Mar 16, 2012 at 3:03 PM, Frederick Stevens
[email protected]wrote:

them.

Cheers,

Fred

While you’re doing the rebuild, can you set the optimization flag to
-O2?
It’s -O3 right now by default, and every now and then it can be a
problem
(haven’t heard of one in a while, but it’s been a thing).

It should be setting ‘-DCMAKE_CXXFLAGS=“-O2”’.

Tom

On Fri, Mar 16, 2012 at 3:11 PM, Frederick Stevens
[email protected]wrote:

I’ve added your suggested changes. It will be an hour or so until
compilation is finished. Slackware CFLAGS or CXXFLAGS are usually -O2 by
default.

Cheers,

Fred

Thanks. GNU Radio resets these flag to -O3 right now, though.

Also, can’t you run ‘make -jN’ to speed up the compilation?

Tom

I’ve added your suggested changes. It will be an hour or so until
compilation is finished. Slackware CFLAGS or CXXFLAGS are usually -O2 by
default.

Cheers,

Fred

On Fri, Mar 16, 2012 at 6:11 PM, Frederick Stevens
[email protected]wrote:

at

/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74

74 *cPtr++ = (*aPtr++) * (*bPtr++);
(gdb) bt
#0 0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
(cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
at

/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74

Alright, Fred, definitely something strange going on here. My only guess
is
that for some reason on your architecture/OS/whatever, something is
being
handled incorrectly and the buffers a, b, and c are not getting
generated
correctly, maybe something like it’s not doubling the number of items
for
the complex data type (before this function test, there are 16ic, or
complex shorts, being tested, but this is the first complex float test).

It’s hard to tell if it’s something about it being an AMD chip, 32-bit,
Slackware version, gcc version, etc. And I don’t have an AMD chip to
test
on, but I could load up a 32-bit Slackware VM at least.

How much work are you willing to put into this to help us nail this
down?

If you can follow through the volk_profile test code, we can start
outputting more debug info. To start with, I’d suggest going into
volk/apps/volk_profile.cc and commenting out line 38, rebuild the
application, and run this new volk_profile to see if it fails on any
other
kernels.

Thanks,
Tom

Well, after a few restarts, here is my output. I did a fresh pull from
git because I was getting some errors with missing *.h files in
gruel/src/swig or something like that. Hope this helps!

RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a

Program received signal SIGSEGV, Segmentation fault.
0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic (cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
74 *cPtr++ = (*aPtr++) * (*bPtr++);
(gdb) bt
#0 0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
(cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
#1 0xb7ed4d68 in volk_32fc_32f_multiply_32fc_a_manual
(cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008, num_points=204600,
arch=0x8079ac4 “generic”)
at /home/fred/extras/gnuradio/gnuradio/build/volk/lib/volk.c:749
#2 0x08064533 in run_cast_test3 (
func=0x80595c0 volk_32fc_32f_multiply_32fc_a_manual@plt,
buffs=…,
vlen=204600, iter=999, arch=…)
at /home/fred/extras/gnuradio/gnuradio/volk/lib/qa_utils.cc:182
#3 0x08062770 in run_volk_tests (desc=…,
manual_func=0x80595c0 volk_32fc_32f_multiply_32fc_a_manual@plt,
name=…, tol=9.99999975e-05, scalar=…, vlen=204600, iter=1000,
best_arch_vector=0xbfffe714)
at /home/fred/extras/gnuradio/gnuradio/volk/lib/qa_utils.cc:351
#4 0x0805b3d3 in main (argc=1, argv=0xbffff204)
at /home/fred/extras/gnuradio/gnuradio/volk/apps/volk_profile.cc:38
(gdb)

Dump of assembler code for function
volk_32fc_32f_multiply_32fc_a_generic:
0xb7edbb39 <+0>: push %ebp
0xb7edbb3a <+1>: mov %esp,%ebp
0xb7edbb3c <+3>: sub $0x14,%esp
0xb7edbb3f <+6>: mov 0x8(%ebp),%eax
0xb7edbb42 <+9>: mov %eax,-0x4(%ebp)
0xb7edbb45 <+12>: mov 0xc(%ebp),%eax
0xb7edbb48 <+15>: mov %eax,-0x8(%ebp)
0xb7edbb4b <+18>: mov 0x10(%ebp),%eax
0xb7edbb4e <+21>: mov %eax,-0xc(%ebp)
0xb7edbb51 <+24>: movl $0x0,-0x10(%ebp)
0xb7edbb58 <+31>: movl $0x0,-0x10(%ebp)
0xb7edbb5f <+38>: jmp 0xb7edbbae
<volk_32fc_32f_multiply_32fc_a_generic+117>
0xb7edbb61 <+40>: mov -0x8(%ebp),%eax
0xb7edbb64 <+43>: mov (%eax),%ecx
0xb7edbb66 <+45>: mov 0x4(%eax),%edx
0xb7edbb69 <+48>: mov %ecx,%eax
0xb7edbb6b <+50>: mov %eax,-0x14(%ebp)
0xb7edbb6e <+53>: flds -0x14(%ebp)
0xb7edbb71 <+56>: mov -0xc(%ebp),%eax
=> 0xb7edbb74 <+59>: flds (%eax)
0xb7edbb76 <+61>: fmulp %st,%st(1)
0xb7edbb78 <+63>: mov %edx,-0x14(%ebp)
0xb7edbb7b <+66>: flds -0x14(%ebp)
0xb7edbb7e <+69>: mov -0xc(%ebp),%eax
0xb7edbb81 <+72>: flds (%eax)
0xb7edbb83 <+74>: fmulp %st,%st(1)
0xb7edbb85 <+76>: fxch %st(1)
0xb7edbb87 <+78>: fstps -0x14(%ebp)
0xb7edbb8a <+81>: mov -0x14(%ebp),%ecx
0xb7edbb8d <+84>: fstps -0x14(%ebp)
0xb7edbb90 <+87>: mov -0x14(%ebp),%edx
0xb7edbb93 <+90>: mov -0x4(%ebp),%eax
0xb7edbb96 <+93>: mov %ecx,(%eax)
0xb7edbb98 <+95>: mov %edx,0x4(%eax)
0xb7edbb9b <+98>: addl $0x8,-0x4(%ebp)
0xb7edbb9f <+102>: addl $0x8,-0x8(%ebp)
0xb7edbba3 <+106>: addl $0x4,-0xc(%ebp)
0xb7edbba7 <+110>: addl $0x4,-0xc(%ebp)
0xb7edbbab <+114>: incl -0x10(%ebp)
0xb7edbbae <+117>: mov -0x10(%ebp),%eax
0xb7edbb78 <+63>: mov %edx,-0x14(%ebp)
0xb7edbb7b <+66>: flds -0x14(%ebp)
0xb7edbb7e <+69>: mov -0xc(%ebp),%eax
0xb7edbb81 <+72>: flds (%eax)
0xb7edbb83 <+74>: fmulp %st,%st(1)
0xb7edbb85 <+76>: fxch %st(1)
0xb7edbb87 <+78>: fstps -0x14(%ebp)
0xb7edbb8a <+81>: mov -0x14(%ebp),%ecx
0xb7edbb8d <+84>: fstps -0x14(%ebp)
0xb7edbb90 <+87>: mov -0x14(%ebp),%edx
0xb7edbb93 <+90>: mov -0x4(%ebp),%eax
0xb7edbb96 <+93>: mov %ecx,(%eax)
0xb7edbb98 <+95>: mov %edx,0x4(%eax)
0xb7edbb9b <+98>: addl $0x8,-0x4(%ebp)
0xb7edbb9f <+102>: addl $0x8,-0x8(%ebp)
0xb7edbba3 <+106>: addl $0x4,-0xc(%ebp)
0xb7edbba7 <+110>: addl $0x4,-0xc(%ebp)
0xb7edbbab <+114>: incl -0x10(%ebp)
0xb7edbbae <+117>: mov -0x10(%ebp),%eax
0xb7edbbb1 <+120>: cmp 0x14(%ebp),%eax
0xb7edbbb4 <+123>: jb 0xb7edbb61
<volk_32fc_32f_multiply_32fc_a_generic+40>
0xb7edbbb6 <+125>: leave
0xb7edbbb7 <+126>: ret
End of assembler dump.

Cheers,

Fred

Volk_profile ran to completion. I am using the git source tree updated
just before I did the run. I commented out line 38 of volk_profile.cc
as you suggested and ran volk_profile under gdb. The output is in the
attached text file. I have also attached the generated volk_config from
~/.volk/volk_config.

I noted from running gnuradio-companion version 3.5.1, (which works)
that when I use a multiply block, this message from python is generated:

./top_block.py

gr_fir_fff: using 3DNow!

but volk_profile does not seem to recognize the 3DNow! processor
extensions (produces sse2 and sse3 messages on the Intel Atom 32 bit
machine).

Hope this helps! Let me know if you want me to try anything else. I’ll
let you know how things turn out on the other machine as well.

Cheers,

Fred

Well, at least for the moment, I have time (looking for employment,
etc.). I was considering trying it on another AMD 32 bit machine that I
have sitting here with the same version of Slackware on it. It may take
me a day or two to bring it up to the same level of software as this
one. It’s a uniprocessor motherboard but the same vintage of processor.

I am rebuilding gnuradio with your suggestions tonight and will let you
know what gives when it is finished. I’m also trying to get some RF
hardware finished (my real area of expertise) so I will work this in
between the rest of things. Shouldn’t be a problem though.

Cheers,

Fred

On Sun, Mar 18, 2012 at 8:00 PM, Frederick Stevens
[email protected]wrote:

Volk_profile ran to completion. I am using the git source tree updated
just before I did the run. I commented out line 38 of volk_profile.cc as
you suggested and ran volk_profile under gdb. The output is in the
attached text file. I have also attached the generated volk_config from
~/.volk/volk_config.

Thanks. Strange that it’s just that kernel, then. Can you put in some
debug
lines that will print out the size of the buffers being used and the
‘number’ variable in volk_32fc_x2_multiply_32fc_a when the crash occurs.
I
just want to see if the loop is trying to go beyond the bounds of the
arrays.

I noted from running gnuradio-companion version 3.5.1, (which works) that
when I use a multiply block, this message from python is generated:

./top_block.py

gr_fir_fff: using 3DNow!

but volk_profile does not seem to recognize the 3DNow! processor
extensions (produces sse2 and sse3 messages on the Intel Atom 32 bit
machine).

Yeah, that’s fine. Without a 3DNow! kernel, Volk will just fall back on
the
generic implementation. The thought being that the generic version will
work for everyone. So we need to figure out why that’s not true for
your…

Hope this helps! Let me know if you want me to try anything else. I’ll
let you know how things turn out on the other machine as well.

Cheers,

Fred

Thanks.

Tom

Tom,

See the attached file. I am running volk_profile now. If this is what
you need then that is great otherwise I will keep working on this with
whatever suggestions you have.

Cheers,

Fred

I’ll give it a try (more than willing) but you will have to point me to
the files to insert the debug print statements and the buffers to use as
I am very new to c++ and know only enough c code to be dangerous if left
to myself. Usually have my nose in some assembler and used to do a lot
of forth. I do make judicious use of man pages and HOWTO help though.

Cheers,

Fred

On Mon, Mar 19, 2012 at 12:04 PM, Frederick Stevens
[email protected]wrote:

That’ll be a good start. We’ll see if that tells us anything.

Thanks,
Tom

Tom,

New run using my simple “trace” See attached files.

Cheers,

Fred