- 论坛徽章:
- 0
|
Everything you need to know about broadcom hardware (Part 1)
http://forum.openwrt.org/viewtopic.php?pid=64086
Inside pretty much any home router or access point you'll find the following
- flash chip (2M, 4M or somewhat rarely 8M)
- ram (4x the amount of flash)
- cpu (mips; provided by a broadcom 47xx or 5352)
- 6 port vlan managed switch (adm6996l, or more commonly the broadcom "roboswitch")
- wifi (broadcom 43xx based)
Chances are that almost all of that functionality will come from one or two Broadcom chips. The ram and flash are the exception.
Depending
on the device you could have as little as 2/8 (ram/flash) or as much as
8/32, but by far the most common combination is 4/16; probably an intel
flash chip.
The flash chip can be represented as a large block of continuous space:
Code:[ start of flash ...... end of flash ]
There
is no ROM to boot from; at power up the CPU begins executing the code
at the very start of flash. Luckily this isn't the firmware or we'd be
in real danger every time we reflashed. Boot is actually handled by a
section of code we tend to refer to as the boot loader. In Broadcom
devices this is CFE -- "Common Firmware Environment"; think of it like
the BIOS in your computer.
(note - in wrt54g v1.x hardware, it
was actually another boot loader called "PMON", it wasn't until the
wrt54g v2.0 that they switched to CFE; both provide the exact same
functionality)
Code:[ CFE ] [ firmware ....... ] [ NVRAM ]
(there's no actual partitions, just hard coded locations)
The
job of the boot loader is to initialize the memory and other hardware
and then begin booting the firmware. In most cases there's a recovery
mechanism that allows you to reflash the firmware so that a bad flash
doesn't render the device useless. CFE does this through the use of a
TFTP server; this can be triggered by the firmware not matching the
firmware checksum, the boot_wait variable or via CFE's serial console
command line.
If you dig into the "firmware" section you'll find a trx. A trx is just an encapsulation, which looks something like this -
Code:[ HDR0 ][ length ][ crc32 ][ flags ][ pointers ][ data ... ]
"HDR0"
is a magic value to indicate a trx header, rest is 4 byte unsigned
values followed by the actual contents. In short, it's a block of data
with a length and a checksum. So, our flash usage actually looks
something like this:
Code:[ CFE ][ trx containing firmware ][ NVRAM ]
Except that the firmware is generally pretty small and doesn't use the entire space between CFE and NVRAM:
Code:[ CFE ][ trx firmware ][ unused ][ NVRAM ]
(Note:
that the .bin files are nothing more than the generic trx
file with an additional header appended to the start to identify the
model. The model information gets verified by the vendor's upgrade
utilities and the remaining data -- the trx -- gets written to the
flash. When upgrading from within openwrt remember to use the trx file.)
So what exactly is the firmware?
The
boot loader really has no concept of filesystems, it pretty much
assumes that the start of the trx data section is executable code. So,
at the very start of our firmware is the kernel. But just putting a
kernel directly onto flash is quite boring and consumes a lot of space,
so we compress the kernel with a heavy compression known as LZMA. Now
the start of firmware is code for an LZMA decompress:
Code:[lzma decompress][lzma compreszsed kernel]
Now,
the boot loader boots into an LZMA program which decompresses the
kernel into memory and executes it. It adds a second to the bootup
time, but it saves a large chunk of flash space. (And if that wasn't
amusing enough, it turns out the boot loader does know gzip
compression, so we gzip compressed the LZMA decompression program)
Immediately
following the kernel is the filesystem. We use squashfs for this
because it's a highly compressed readonly filesystem -- remember that
altering the contents of the trx in any way would invalidate the crc,
so we put our writable data in a jffs2 partition ouside the trx. This
means that our firmware looks like this:
Code:[trx (gzip'd lzma decompress)(lzma'd kernel)(squashfs filesystem)]
And the entire flash usage looks like this -
Code:[CFE][trx (gz'd lzma)(lzma'd kernel)(squashfs)][ jffs2 filesystem ][NVRAM]
That's about as tight as we can possibly pack things into flash.
Why squashfs+jffs2?
System bootup is as follows -
- kernel boots from squashfs and runs /etc/preinit
- /etc/preinit runs /sbin/mount_root
-
mount_root mounts the jffs2 partition (/jffs) and combines it with the
squashfs partition (/rom) to create a new virtual root filesystem (/)
- bootup continues with /sbin/init
Both
squashfs and jffs2 are compressed filesystems using LZMA for the
compression. Squashfs is a readonly filesystem while jffs2 is a
writable filesystem with journaling and wear leveling. Since squashfs
is a readonly filesystem, it doesn't need to align the data, allowing
it to pack the files tighter for 20-30% savings over a jffs2 filesystem.
Our
job when writing the firmware is to put as much common functionality on
squashfs while not wasting space with unwanted features. Additional
features can always be installed onto jffs2 by the user. The use of
mini_fo means that the filesystem is presented as one large writable
filesystem to the user with no visible boundary between squashfs and
jffs2 -- files are simply copied to jffs2 when they're written.
It's not all without side effects however -
The
fact that we pack things so tightly in flash means that if the firmware
ever changes, the size and location of the jffs2 partition also
changes, potentially wiping out a large chunk of jffs2 data and
corrupting the filesystem. To deal with this, we've implemented a
policy that after each reflash the jffs2 data is reformatted. The trick
to doing that is a special value, 0xdeadc0de; when this value appears
in a jffs2 partition, everything from that point to the end of the
partition is wiped. So, hidden at the end of the firmware images, is
the value 0xdeadcode, positioned such that it becomes the start of the
jffs2 parition.
The fact we use a combination of compressed and
partially readonly filesystems also has an interesting effect on
package management. In particular, you need to be careful what packages
you update. While the ipkg util is more than happy to install an
updated package on jffs2, it's unable to remove the original package
from squashfs; the end result is that you slowly start using more and
more space until the jffs2 partition is filled. The ipkg util really
has no idea how much space is available on the jffs2 partition since
it's compressed, and so it will blindly keep going until the ipkg
system crashes -- at that point you have so little space you probably
can't even use ipkg to remove anything.
Can we switch the filesystem to be entirely jffs2?
Yes,
it's technically possible, but a bit of a mess to actually pull off.
The firmware has to be loaded as a trx file, which means that you have
to put teh jffs2 data inside of the trx. But, as I said above, the trx
has a checksum, meaning that if you ever change that data, you
invalidate teh checksum. The solution is that you install with the
jffs2 data contained within the trx, and then change the trx bounaries
at runtime. The end result is a single jffs2 partition for the root
filesystem. Why someone would want to do it is beyond me; it takes more
space, and while it would allow you to upgrade the contents of the
filesystem you would still be unable to replace the kernel (outside of
the filesystem), meaning that it's not a seemless upgrade between
releases. Having squashfs gives you a failsafe mechanism where you can
always ignore the jffs2 partition and boot directly off squashfs, or
restore files to their original squashfs versions.
I used to
have a trick where I could convert a squashfs install to a jffs2
install at runtime by copying all the data onto the squashfs partition
and changing the partition boundaries. I never really had much use for
the util -- not to mention it required a rather large flash to store
both squashfs and jffs2 copies of the root durring transition -- so
support for it was dropped.
As for the proper ways to recover a "bricked" router -
failsafe -
OpenWrt
has a builtin failsafe mode which will attempt to bypass almost all
configuration in favor of hardcoded defaults, resulting in a router
that boots up as 192.168.1.1 with few if any services running. From
this state you can telnet in and fix any problems you may have with the
filesystem or configuration.
boot_wait -
The single best
thing you can do is have boot_wait set, meaning that all you have to do
is TFTP a new firmware. At one time the reflashing instructions
included a an exploit for the Linksys firmware that set the boot_wait
variable; as time progressed and Linksys eventually fixed the bug
(after several failed attempts) we found that people were flashing to
other firmwares for the sole purpose of setting boot_wait so they could
reflash to OpenWrt. We figured this was somewhat pointless and altered
the instructions to indicate that you could safely reflash to OpenWrt
without setting boot_wait.
JTAG -
It's one of those amazingly
useful things that allows you to recover from pretty much anything that
doesn't involve a hardware failure. While the JTAG can technically be
used to watch every instruction and register as the system boots, the
recovery software only uses it for DMA access to the flash chip, making
it somewhat a blind recovery mechanism.
The biggest mistake
people seem to make with JTAG is the "wipe everything and reload CFE"
approach; they either can't find the correct CFE version after wiping
the device, or they reflash with a CFE which is incompatible with their
device. You should always try to use the CFE version that came with the
device rather than attempting to replace it with some random CFE you
found on the internet.
Second mistake - embedded within CFE is a
set of NVRAM defaults to be used if the NVRAM partition is missing.
This means that in most cases you can just wipe everything but CFE and
it'll happily boot, recreate NVRAM and start waiting for a firmware via
TFTP. In some cases however, the defaults embedded defaults (in the CFE
shipped with the device) don't match the actual hardware and CFE will
fail to boot. This is why we have the warnings not to wipe NVRAM. To
recover from this situation you need either the original NVRAM
contents, or a version of CFE with the correct defaults.
Serial -
Serial
consoles are great, there's just one problem - the routers run on 3.3v
and a normal PC serial port puts out +/-12v, easily frying a router.
This means that a level shifter such as a max233 is required, and
adding the ICs and caps required is beyond the ability of most users --
luckily there's a shortcut. Most cellphones are either USB or 3.3v
serial, so the data cable for a 3.3v cellphone can be used to make an
easy and professional looking serial console connection. You only need
to identify and connect 4 wires (vcc, rx, tx, gnd) -- and if your cable
uses a pl2303 you can skip the vcc connection.
Serial console
allows you to interact with the CFE command line, watch the kernel boot
and console access to linux. This is probably the only way you'll every
get any meaningful feedback about the device boot up.
LEDs -
Most
people assume the LEDs on the front are deterministic, and that by
telling you which LEDs are lit you can instantly tell if the hardware
is working or where it crashed in bootup. This unfortunately isn't the
slightest bit true.
- Power LED. The biggest mistake people
make here is "my power led is blinking, what does that mean?". There's
an assumption that if the LED is blinking there must be software
turning the LED on and off, and that it must mean something. The
blinking is actually done in hardware; software only as the ability to
set the LED "on" or "blink" -- it defaults to blink on power up and
isn't set to on until after the firmware boots. If the led is on then
you know the firmware booted; blinking really doesn't tell you much.
- Switch LEDs. The second common mistake is "the switch still works".
Of course the switch still works, it's a separate piece of hardware and
the LEDs are wired directly to it. The only useful bit of information
you can get is "all the switch LEDs are lit". When the switch chip is
reset, all of the ports will light up (even if no devices are
connected) for about a second; this happens at power up and again as
the firmware boots and reprograms the switch. If they stay lit, you're
either a moron for not noticing the ports are actually in use, or
someone has broken/shorted the switch chip. You can also notice reboot
loops by watching for the switch reset.
- Diag/DMZ LED. Controlled by OpenWrt (diag module) to indicate bootup.
- Wifi. Controlled by the wifi driver; trivia - the wifi driver can also reset the power led in certain situations.
....
Stupid things people do -
Pin shorting -
In
the past we used to suggest that people shorted a few pins of the
flash; when CFE booted and attempted to perform the CRC32 there would
be a flash read error which would change the outcome of the CRC and the
resulting failure would force CFE into recovery mode. It's a great
trick, but over the years we've learned that people are idiots and will
take that as an invitation to poke mangle and short just about every
pin on the device based on some irrational belief that if they find the
right pin everything will magically work again. You do not want someone
paranoid at the thought of breaking the device scraping up every single
electrical connection on the device -- it never ends well, and
generally results in the flash chip or the router being damaged in the
process.
- frying a chip (worst case)
- lifting/breaking electrical connections
- permanently shorting (best case)
The best case is that they simply bent a pin and you can easily bend it back - providing you can find it.
Depending
on which pins are shorted/broken, it may be possible to access CFE but
not to access the rest of the flash. Meaning CFE boots fine but can't
read or write the firmware. This can be confirmed by JTAG.
Wrong CFE version -
Loading
the wrong CFE version can also lead to devices which boot into CFE but
are unable to write to the flash, or are unable to initialize the
networking.
And yes, there are actually a few obscure versions
that require the firmware to be named "code.bin" or a specific port to
be used. Unfortunately nobody can remember exactly which devices,
leading to all sorts of superstition.
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u2/63099/showart_506567.html |
|