Hi folks,
Firstly, thanks very much for your response Nick - you’ve answered my
main question on how to recover a completely bricked N210 in future.
There are two reasons why I’m hesitant to blame UHD: I ran the update
while I was (accidentally but foolishly!) streaming data from the
USRP, albeit at its minimum rate; and we have modified the UHD host,
FPGA and ZPU firmware. I’m pretty sure the fault was at least
partially due to network congestion from the streaming, or the USRP
being unable to do two things at once, so I’m hesitant to blame any
part of the ‘vanilla’ UHD. I also can’t rule out a problem with our
firmware/bitstream, though updates have been 100% successful since.
With this in mind, the details are:
OS: Windows XP SP2 32-bit
UHD: modified version, compiled in WinXP/MSVC 10 from source obtained
through git on 22 Feb 2011.
FPGA/Firmware: as above; firmware compiled in Ubuntu 10.10, FPGA
bitstream compiled in Xilinx ISE 12.4 Windows and Linux. The same host
files and firmware have worked fine before and since. I have tested
our host modifications in Linux with both standalone UHD and Gnuradio,
and Windows with standalone UHD and LabVIEW (using a custom C
interface to the UHD library).
Ethernet card: not available now, I can obtain info if necessary
Network environment: when the problem occurred the N210 was directly
connected to the host PC. Since recovery the N210 is on a gigabit LAN
with no problems.
Circumstances of error: Our N210 has the LFRX and LFTX. I had been
streaming at the minimum rate (~200 ksps) when I ran:
‘python usrp_n2xx_net_burner.py --fw=<firmware_location>
–fpga=<bitstream_location>’ . No other options were enabled. The
program hung during writing of the FPGA image, and I realised that
streaming was still on and meaningful data was still being received. I
forcibly closed the streaming program, left the burner alone for
several minutes then ctrl-c’ed out. There were several traceback
messages which I unfortunately did not record. I immediately reran the
same command, and it appeared to succeed. I rebooted and no lights at
all came on except the yellow Ethernet light, which took a few seconds
to switch on - I tried safe mode as well, with the same result.
I connected a Xilinx platform cable and read the FPGA status bits from
within iMPACT; I did not record them but I recall that all the
programming-related indicator bits were off, as though the FPGA was
not being programmed at all upon power-up. The CRC error indicator
bits were also off. I then programmed the FPGA with a .bit file, as
you described, at which point it booted successfully - all the
programming indicator bits had also switched on. I reran
usrp_n2xx_net_burner as above, however the problem was still present
after rebooting. I tried this several times with different
combinations of FPGA/firmware images, including the most recent
vanilla UHD - none worked. I then ran usrp_n2xx_net_burner again with
the --overwrite-safe flag and my own images; after rebooting all was
well.
I do not understand why the FPGA did not seem to be programmed by the
flash until the safe image was re-burnt; I have not studied the N210
schematics or the FPGA/flash chip datasheets so I don’t understand how
the FPGA boots in detail.
I’m still debugging our firmware/bitstream, so I’ve been flashing the
N210 regularly — I’m careful to avoid simultaneous streaming and
network congestion, and the problem has not recurred. I’m a bit afraid
to recreate the problem, but if it occurs again I’ll try to establish
exactly what combination of factors causes it. For the time being,
however, I’ll chalk it up to the simultaneous streaming and ctrl-c’ing
(and cosmic rays of course!).
Thank you again for the detailed recovery guide. Since we have the
host, firmware and FPGA toolchains running, I have access to .bit and
.ihx files for our design.
Also, I noticed that iMPACT offered to write the flash by writing a
temporary bitstream to the FPGA to facilitate JTAG->flash
communication, however it seemed to need a lot of configuration
settings. Out of curiosity, is this feasible? ( It would be useful to
rewrite the flash in one go from iMPACT - not that I dislike RS232 or
anything )
Best regards, sorry about the wall of text. Hope the detail helps.
Vlad
---------- Forwarded message ----------
From: Nick F. [email protected]
Date: 20 April 2011 10:29
Subject: Re: [Discuss-gnuradio] Bricking and recovery of N210
To: Vladimir Negnevitsky [email protected]
Cc: [email protected]
On Wed, 2011-04-20 at 10:13 +1000, Vladimir Negnevitsky wrote:
images worked fine and FLASH burns have worked since then.
This is something that’s strangely come up three times this week on the
list, under similar circumstances, despite several months of (mostly)
trouble-free updates. We’ve been so far unable to replicate it here or
deduce why it’s happening, and I’ve got an N210 here which I’ve been
continuously updating for several hours without incident. Either it’s
cosmic rays, or something else we haven’t found yet. Can you please send
me the following information:
- OS and version
- UHD host code version
- FPGA/firmware versions
- Ethernet card model
- Any hub or switch in between the PC and the N210?
- What exactly was the behavior of the net burner app? Did it crash, or
stall? What arguments did you invoke it with?
I have a feeling I was very lucky. Since I only reprogrammed the FPGA,
it would have searched the FLASH for working ZPU firmware, which must
have been intact. My question is, is there any technique to recover
the N210 in the event that both the FPGA and ZPU firmware are corrupt?
I’ve seen it alluded to in old posts where the CPU and FPGA firmware
were accidentally written into each other’s locations. The technique
would be very useful if a similar crash happens again and it does
overwrite the CPU firmware this time. I’d also like to hear others’
success (or failure) stories.
You’re right about the mechanism of recovery. In the future, if the
updater crashes, locks up, or otherwise behaves anomalously, do the
following before rebooting:
Re-load the safe firmware and FPGA image with the --overwrite-safe
option of the N210 firmware updater (NEVER USE THIS OPTION OTHERWISE).
Use the latest N210 images from the UHD download site for this step. It
should then be safe to reboot.
For future reference, if you (or anyone else) manage to brick your N210
using the firmware updater, we’ll be happy to issue an RMA to have it
recovered here. If you’d like to recover it yourself, it’s really not
that hard, provided you have a Xilinx Platform JTAG cable and a USB to
RS232 (3.3V logic level ONLY) adapter. Here’s how to bootstrap it:
-
JTAG program FPGA with n210.bit (email me for it, it’s just the .bin
file with a header, but iMPACT requires it). The FPGA bootloader will
start and won’t find software, so it falls back to an Intel HEX prompt.
If your firmware hasn’t been erased, it will boot the firmware, the LEDs
on the front panel will go through their startup sequence, and you can
skip step 2.
-
Build the N210 firmware from the latest UHD master (or I can email it
to you). Connect the USB-RS232 adapter (be SURE it’s 3.3V logic level
output!) to J305 – the silkscreen labels the pinout. Run the program
firmware/zpu/bin/uart_ihex_ram_loader.py <path-to-.IHX-file>. The IHX
file will be named usrp2p_txrx_uhd.ihx. The RAM loader should say "USRP2
- found" and start loading. When it finishes, it will load the program.
- At this point the device is up and running like a regular USRP N210,
and you should be able to write firmware and FPGA images to it. Use the
–overwrite-safe option of the N210 firmware update program to write
safe FPGA and firmware images first, then load production FPGA and
firmware images. Use the latest N210 images from the UHD download site.
After loading all four images, go ahead and reset the device. It should
now be operating normally.
We apologize for any inconvenience, and we’ll find an explanation for
the issue as soon as possible.
–n