NAND
Dump Analysis, Bit Errors Fixing with ECC, UBI Image Analysis,
and Firmware Extraction Demystified
by cawan (cawan[at]ieee.org
or chuiyewleong[at]hotmail.com)
on 25/05/2023
1 - Introduction
2 - NAND Dump Analysis
3 - Bit Errors Fixing with ECC
4 - UBI Image Analysis
5 - Firmware Extraction
6 - Conclusion
1 - Introduction
This is a paper about how a NAND dump to be processed from a hacker point
of
view and obtain all the files included in the dump file. For each step of the
process, the applied method is explained in detail together with example.
The NAND dump that is going to focus in depth is physical NAND dump, which is
the dump file getting from a universal programmer. For the dump file getting
from bootloader such as u-boot, I name it as logical NAND dump. For logical
NAND dump, the correctness of data is ensured by the Flash Translation Layer
(FTL). In other words, the FTL will do all the bit errors fixing with Error
Correcting Code (ECC) for you. However, for physical NAND dump, the data will
come along with ECC, and you are on your own to guess how to use the ECC
to
ensure the correctness of data. If bit errors exist, the ECC should be
used
to fix the errors accordingly. But, it is not easy to guess how the ECC
works
associated with the data. If the association between ECC and data is not
known, it is impossible to use the ECC to fix bit errors in data. So, it
is
necessary to perform thorough NAND dump analysis systematically and
uncover
the association between ECC and data which is in secret. It is not a good
idea to uncover the secret by brute forcing it blindly. Instead, by
making
use the result from thorough analysis, the blindly brute forcing can be
transformed into guided brute forcing. As a result, the chance of getting
the secret association between ECC and data is maximized in the guided
brute forcing manner.
Once the bit errors in data get fixed, and the ECCs get removed, the NAND
dump transformed from physical into logical, and it is ready for actual
firmware image analysis. As a real case scenario for this paper, an UBI
image is going to deal with. The analysis to the UBI image will be
discussed in pretty detail. Based on the substantial knowledge gained
from
the UBI image analysis, a creative approach is proposed to recover the
file
system and extract all the files being hosted inside the file system. It
is important to note that the entire process being discussed in this paper
is not possible to replicate with those automated tools such as binwalk or
unblob. Besides, the entire analysis process is
getting demonstrated on
step-by-step basis manually to make sure everything is explained clearly.
Without wasting more time in mere talk, let's get started from the actual
NAND dump analysis in details.
2 - NAND Dump Analysis
First of all, let's start with a little bit of fundamental stuff. A NAND
flash comprises a lot of so called "page" in certain size, and a
group of
"page" in certain count will make up a "block". Since the
sample NAND dump
that is going to be used for the demonstration is obtained from an actual
NAND chip with part number of MT29F2G08ABAEAWP, and so it should be used
as example to illustrate the hacking-related technical specification
accordingly. So, for MT29F2G08ABAEAWP, the size of a "page" is
2048+64=2112
bytes, and a group of 64 "page" make up a "block", and 2048
"block" make up
the entire storage of the NAND flash, which contain 2048*64=131072
"page".
For each "page" with 2112 bytes in size, the first 2048 bytes are
data and
the rest of 64 bytes are spare area to host ECC or some kind of vendor
specific metadata. Sometimes, the spare area is also known as Out Of Band
(OOB) in some literatures. As a overview of the
sample NAND dump in hex
mode for the first "page", 0x0000 to 0x07ff is data portion, and
0x0800 to
0x083f is spare area or OOB portion, as shown below.
cawan% hexdump -C -n 2112
./MT29F2G08ABAEAWP@TSOP48.BIN
00000000 20 54 56 4e 00 02 00 00 a0 ac 00 00 ff ff ff ff | TVN............|
00000010 55 aa 55 aa 2e 00 00 00 20 02 00 b0 00 00 00 01 |U.U..... .......|
00000020 64 02 00 b0 18 00 00 c0 20 02 00 b0 18 00 00 01 |d....... .......|
00000030 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
00000040 28 18 00 b0 4a d8 dc 53 08 18 00 b0 14 80 00 00 |(...J..S........|
00000050 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
00000060 aa 55 aa 55 01 00 00 00 00 18 00 b0 76 04 03 00 |.U.U........v...|
00000070 aa 55 aa 55 01 00 00 00 04 18 00 b0 21 00 00 00 |.U.U........!...|
00000080 aa 55 aa 55 01 00 00 00 04 18 00 b0 23 00 00 00 |.U.U........#...|
00000090 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
000000a0 aa 55 aa 55 01 00 00 00 04 18 00 b0 27 00 00 00 |.U.U........'...|
000000b0 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
000000c0 aa 55 aa 55 01 00 00 00 20 18 00 b0 00 00 00 00 |.U.U.... .......|
000000d0 24 18 00 b0 00 00 00 00 1c 18 00 b0 00 40 00 00 |$............@..|
000000e0 18 18 00 b0 32 03 00 00 10 18 00 b0 06 00 00 00 |....2...........|
000000f0 04 18 00 b0 27 00 00 00 aa 55 aa 55 01 00 00 00 |....'....U.U....|
00000100 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
00000110 04 18 00 b0 2b 00 00 00 04 18 00 b0 2b 00 00 00 |....+.......+...|
00000120 04 18 00 b0 2b 00 00 00 18 18 00 b0 32 02 00 00 |....+.......2...|
00000130 1c 18 00 b0 81 47 00 00 1c 18 00 b0 01 44 00 00 |.....G.......D..|
00000140 04 18 00 b0 20 00 00 00 34 18 00 b0 20 88 88 00 |.... ...4... ...|
00000150 aa 55 aa 55 01 00 00 00 18 02 00 b0 08 00 00 00 |.U.U............|
00000160 60 31 00 b8 00 80 00 00 a0 31 00 b8 00 80 00 00 |`1.......1......|
00000170 2c 02 00 b0 00 01 00 00 2c 02 00 b0 00 01 00 00 |,.......,.......|
00000180 2c 02 00 b0 00 01 00 00 00 00 00 00 00 00 00 00 |,...............|
00000190 13 00 00 ea
14 f0 9f e5 10 f0 9f e5 0c f0 9f e5 |................|
000001a0 08 f0 9f e5 04 f0 9f e5 00 f0 9f e5 04 f0 1f e5 |................|
000001b0 20 03 00 00 78 56 34 12 78 56 34 12 78 56 34 12 | ...xV4.xV4.xV4.|
000001c0 78 56 34 12 78 56 34 12 78 56 34 12 78 56 34 12 |xV4.xV4.xV4.xV4.|
000001d0 00 02 00 00 a0 ac 00 00 80 b5 00 00 a0 ac 00 00 |................|
000001e0 de c0 ad
0b 00 00 0f e1 1f 00 c0 e3 d3 00 80
e3 |................|
000001f0 00 f0 29 e1 bc
d0 9f e5 07 d0 cd e3 00 00 a0 e3 |..).............|
00000200 70 05 00 eb 00 40 a0 e1 01 50 a0 e1 02 60 a0 e1 |p....@...P...`..|
00000210 04 d0 a0 e1 8c 00 4f e2 00 90 46 e0 06 00 50 e1 |......O...F...P.|
00000220 06 00 00 0a 06 10 a0 e1 5c 30 1f e5 03 20 80 e0 |........\0... ..|
00000230 00 06 b0 e8 00 06 a1 e8 02 00 50 e1 fb ff ff
3a |..........P....:|
00000240 74 00 9f e5 74 10 9f e5 00 20 a0 e3 01 00 50 e1 |t...t.... ....P.|
00000250 02 00 00 2a 00 20 80 e5 04 00 80 e2 fa ff ff
ea |...*.
..........|
00000260 00 00 9f e5 00 f0 a0 e1 54 06 00 00 a0 ac 00 00 |........T.......|
00000270 a0 ac 00 00 a0 ac 00 00 00 00 a0 e3 17 0f 07 ee |................|
00000280 17 0f 08 ee
10 0f 11 ee 23
0c c0 e3 87 00 c0 e3
|........#.......|
00000290 02 00 80 e3 01 0a 80 e3 10 0f 01 ee 0e c0
a0 e1 |................|
000002a0 0a 00 00 eb 0c e0 a0 e1 0e f0 a0 e1 00 00 a0 e1 |................|
000002b0 e8 d0 1f e5 fe
ff ff eb 00 80
00 bc a0 ae 00 00
|................|
000002c0 80 b7 00 00 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 |................|
000002d0 68 00 9f e5 00 10 e0 e3 00 10 80 e5 00 00 0f e1 |h...............|
000002e0 c0 00 80 e3 00 f0 21 e1 54 00 9f e5 54 10 9f e5 |......!.T...T...|
000002f0 00 10 80 e5 50 00 9f e5 50 10 9f e5 00 10 80 e5 |....P...P.......|
00000300 4c 00 9f e5 05 14 a0 e3 00 10 80 e5 44 00 9f e5 |L...........D...|
00000310 44 10 9f e5 00 10 80 e5 03 2a a0 e3 01 20 52 e2 |D........*... R.|
00000320 fd ff
ff 1a 20 00 9f e5
30 10 9f e5 00 10 80 e5 |....
...0.......|
00000330 01 2b a0 e3 01 20 52 e2 fd ff ff 1a 0e f0 a0 e1
|.+... R.........|
00000340 24 21 00 b8 04 10 00 b0 84 00 04 40 04 02 00 b0 |$!.........@....|
00000350 ff 0f 00 00 08 02 00 b0 0c 02 00 b0 24 4f 00 00 |............$O..|
00000360 fc 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000370 00 00 51 e3 1f 00 00 0a 01 30 a0 e3 00 20 a0 e3 |..Q......0... ..|
00000380 01 00 50 e1 19 00 00 3a 01 02 51 e3 00 00 51 31 |..P....:..Q...Q1|
00000390 01 12 a0 31 03 32 a0 31 fa ff ff 3a 02 01
51 e3 |...1.2.1...:..Q.|
000003a0 00 00 51 31 81 10 a0 31 83 30 a0 31 fa ff ff
3a |..Q1...1.0.1...:|
000003b0 01 00 50 e1 01 00 40 20 03 20 82 21 a1 00 50 e1 |..P...@ . .!..P.|
000003c0 a1 00 40 20 a3 20 82 21 21 01 50 e1 21 01 40 20 |..@ . .!!.P.!.@ |
000003d0 23 21 82 21 a1 01 50 e1 a1 01 40 20 a3 21 82 21 |#!.!..P...@ .!.!|
000003e0 00 00 50 e3 23 32 b0 11 21 12 a0 11 ef ff ff 1a
|..P.#2..!.......|
000003f0 02 00 a0 e1 0e f0 a0 e1 04 e0 2d e5 c9 1c 00 eb |..........-.....|
00000400 00 00 a0 e3 00 80 bd e8 03 50 2d e9 d7 ff ff
eb |.........P-.....|
00000410 06 50 bd e8 90 02 03 e0 03 10 41 e0 0e f0 a0 e1 |.P........A.....|
00000420 03 50 2d e9 09 00 00 eb 06 50 bd e8 90 02 03 e0 |.P-......P......|
00000430 03 10 41 e0 0e f0 a0 e1 00 00 a0 e1 00 00 a0 e1 |..A.............|
00000440 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 |................|
00000450 00 00 51 e3 01 c0 20 e0 42 00 00 0a 00 10 61 42 |..Q... .B.....aB|
00000460 01 20 51 e2 27 00 00 0a 00 30 b0 e1 00 30 60 42 |. Q.'....0...0`B|
00000470 01 00 53 e1 26 00 00 9a 02 00 11 e1 28 00 00 0a |..S.&.......(...|
00000480 0e 02 11 e3 81 11 a0 01 08 20 a0 03 01 20 a0 13 |......... ... ..|
00000490 01 02 51 e3 03 00 51 31 01 12 a0 31 02 22 a0 31 |..Q...Q1...1.".1|
000004a0 fa ff ff
3a 02 01 51 e3 03 00 51 31 81 10 a0
31 |...:..Q...Q1...1|
000004b0 82 20 a0 31 fa ff ff 3a 00 00 a0 e3 01
00 53 e1 |. .1...:......S.|
000004c0 01 30 43 20 02 00 80 21 a1 00 53 e1 a1 30 43 20 |.0C ...!..S..0C |
000004d0 a2 00 80 21 21 01 53 e1 21 31 43 20 22 01 80 21 |...!!.S.!1C "..!|
000004e0 a1 01 53 e1 a1 31 43 20 a2 01 80 21 00 00 53 e3 |..S..1C ...!..S.|
000004f0 22 22 b0 11 21 12 a0 11 ef ff ff 1a 00 00 5c e3
|""..!.........\.|
00000500 00 00 60 42 0e f0 a0 e1 00 00 3c e1 00 00 60 42 |..`B......<...`B|
00000510 0e f0 a0 e1 00 00 a0 33 cc 0f a0 01 01 00 80 03 |.......3........|
00000520 0e f0 a0 e1 01 08 51 e3 21 18 a0 21 10 20 a0 23 |......Q.!..!. .#|
00000530 00 20 a0 33 01 0c 51 e3 21 14 a0 21 08 20 82 22 |. .3..Q.!..!. ."|
00000540 10 00 51 e3 21 12 a0 21 04 20 82 22 04 00 51 e3 |..Q.!..!. ."..Q.|
00000550 03 20 82 82 a1 20 82 90 00 00 5c e3 33 02 a0 e1 |. ... ....\.3...|
00000560 00 00 60 42 0e f0 a0 e1 04 e0 2d e5 6d 1c 00 eb |..`B......-.m...|
00000570 00 00 a0 e3 04 f0 9d e4 00 00 a0 e1 00 00 a0 e1 |................|
00000580 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 |................|
00000590 20 30 52 e2 20 c0 62 e2 30 02 a0 41 31 03 a0 51 | 0R. .b.0..A1..Q|
000005a0 11 0c 80 41 31 12 a0 e1 0e f0 a0 e1 20 30 52 e2 |...A1....... 0R.|
000005b0 20 c0 62 e2 11 12 a0 41 10 13 a0 51 30 1c 81 41 | .b....A...Q0..A|
000005c0 10 02 a0 e1 0e f0 a0 e1 20 30 52 e2 20 c0 62 e2 |........ 0R. .b.|
000005d0 30 02 a0 41 51 03 a0 51 11 0c 80 41 51 12 a0 e1 |0..AQ..Q...AQ...|
000005e0 0e f0 a0 e1 2d de 4d e2 00 40 a0 e3 6c 31 9f e5 |....-.M..@..l1..|
000005f0 0d 00 a0 e1 00 30 8d e5 04 30 8d e5 1c 40 8d e5 |.....0...0...@..|
00000600 bc d2
8d e5 30 40 8d e5 50 40 8d e5 d1 01 00
eb |....0@..P@......|
00000610 1c 30 9d e5 04 00 53 e1 02 00 00 0a 04 10 a0 e1 |.0....S.........|
00000620 8a 0f 8d e2 33 ff 2f e1 8a 0f 8d e2 01 10 a0 e3 |....3./.........|
00000630 ca 1a 00 eb 00 00 50 e3 46 00 00 1a 70 04 00 eb |......P.F...p...|
00000640 38 42 9d e5 3c 52 9d e5 04 00 a0 e1 05 10 a0 e1 |8B..<R..........|
00000650 46 ff ff
eb 04 10 a0 e1 0e a6 a0 e3 00 b0 a0
e1 |F...............|
00000660 0a 08 a0 e3 41 ff ff eb 04 10 a0 e1 00
70 a0 e1 |....A........p..|
00000670 ec 00
9f e5 3d ff ff eb
04 10 a0 e1 00 90 a0 e1
|....=...........|
00000680 0a 08 a0 e3 5f ff ff eb 01 00 a0 e1 05
10 a0 e1 |...._...........|
00000690 36 ff ff
eb 00 60 a0 e1 24 00 00 ea 3c 12 9d e5
|6....`..$...<...|
000006a0 38 02 9d e5 31 ff ff eb 8a 4f 8d e2 bc 52 9d e5
|8...1....O...R..|
000006b0 50 10 a0 e3 00 20 a0 e3 90 07 03 e0 04 00 a0 e1 |P.... ..........|
000006c0 0f e0 a0 e1 34 f0 95 e5 04 00 a0 e1 0f e0 a0 e1 |....4...........|
000006d0 08 f0 95 e5 ff 00 50 e3 01 90 89 12 12 00 00 1a |......P.........|
000006e0 0e 00 00 ea
bc 42 9d e5 38
02 9d e5 d4 51 94 e5
|.....B..8....Q..|
000006f0 3c 12 9d e5 00 00 55 e3 05 00 00 0a 1b ff ff
eb |<.....U.........|
00000700 04 10 a0 e1 0a 20 a0 e1 90 67 23 e0 8a 0f 8d e2 |..... ...g#.....|
00000710 35 ff 2f e1 3c 32 9d e5 01 60 86 e2 03 a0 8a e0 |5./.<2...`......|
00000720 0b 00 56 e1 ee
ff ff 3a 00 60
a0 e3 01 70 87 e2 |..V....:.`...p..|
00000730 09 00 57 e1 d8 ff ff 9a 1c 30 9d e5 00
00 53 e3 |..W......0....S.|
00000740 02 00 00 0a 8a 0f 8d e2 00 10 e0 e3 33 ff 2f e1 |............3./.|
00000750 0e 36 a0 e3 33 ff 2f e1 2d de 8d e2 1e ff 2f e1 |.6..3./.-...../.|
00000760 00 d0 00 b0 ff cf 11 00 f0 40 2d e9
02 60 d3 e5 |.........@-..`..|
00000770 00 40 d3 e5 00 00 d2 e5 01 c0 d3 e5 02 50 d2 e5 |.@...........P..|
00000780 01 30 d2 e5 00 40 24 e0 03 c0 2c e0 05 60 26 e0 |.0...@$...,..`&.|
00000790 ff 00 04 e2 06 30 8c e1 03 30 90 e1 01 70 a0 e1 |.....0...0...p..|
000007a0 03 00 a0 01 f0 80 bd 08 ac 50 a0 e1 0c 30 25 e0 |.........P...0%.|
000007b0 55 30 03 e2 55 00 53 e3 28 00 00 1a a0 30 20 e0 |U0..U.S.(....0 .|
000007c0 55 30 03 e2 55 00 53 e3 24 00 00 1a a6 30 26 e0 |U0..U.S.$....0&.|
000007d0 54 30 03 e2 54 00 53 e3 20 00 00 1a 80 20 a0 e1 |T0..T.S. .... ..|
000007e0 00 31 a0 e1 20 30 03 e2 40 20 02 e2 03 20 82 e1 |.1.. 0..@ ... ..|
000007f0 80 10 04 e2 80 31 a0 e1 01 20 82 e1 10 30 03 e2 |.....1... ...0..|
00000800 ff ff
00 00 ff ff ff ff ff ff ff
ff ff ff
ff ff |................|
00000810 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
00000820 f6 89 f7 79 e5 60 c9 e0 d6 e3 ed cb 9c b0
f9 f0 |...y.`..........|
00000830 1f da d4 a4 9c d4 1b e0 e0 90 cc 85 d8 d2
e2 80 |................|
00000840
This sample NAND dump is in fact a physical NAND dump from a real industrial
product. As mentioned earlier, this sample will be used as a real case
scenario to illustrate each step of analysis process until the full file
system getting extracted and recovered. Let's start with DumpFlash
tool and
try to identify the ID codes of the NAND chip. However, it's failed and the
output is shown below. This happen might due to the ID codes are missing or
changed to something strange in the NAND dump.
cawan% python2.7 dumpflash.py -i
./MT29F2G08ABAEAWP@TSOP48.BIN
PageSize: 0x200
OOBSize: 0x10
PagePerBlock: 0x20
BlockSize: 0x4000
RawPageSize: 0x210
FileSize: 0x10800000
PageCount: 0x84000
So, just forget about the false output generated by DumpFlash,
and back to
the technical specification as provided by the datasheet of
MT29F2G08ABAEAWP.
Let's have a brief look to the OOB with 64 bytes in size of the first
"page"
in particular.
00000800 ff ff
00 00 ff ff ff ff ff ff ff
ff ff ff
ff ff |................|
00000810 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
00000820 f6 89 f7 79 e5 60 c9 e0 d6 e3 ed cb 9c b0
f9 f0 |...y.`..........|
00000830 1f da d4 a4 9c d4 1b e0 e0 90 cc 85 d8 d2
e2 80 |................|
From this, two assumptions can be made. One, the first 32 bytes of OOB might
be a constant. Two, the second 32 bytes might be ECCs. Let's verify the
first assumption is a fact or a mistake, by checking the OOB of the second
"page", as shown below.
cawan% hexdump -v -C -n
$((2112*2)) ./MT29F2G08ABAEAWP@TSOP48.BIN | tail -n 5
00001040 ff ff
00 00 ff ff ff ff ff ff ff
ff ff ff
ff ff |................|
00001050 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
00001060 8f ce
f4 8b 1c 26 38 00 bd 61 a0 c7 48 c4 d3
60 |.....&8..a..H..`|
00001070 d2 1b 46 ab 53 8f 41 f0 8d 18 2b 3b 8d 54 21 50 |..F.S.A...+;.T!P|
Yes, it seems unchanged. How about the third "page" then ?
cawan% hexdump -v -C -n
$((2112*3)) ./MT29F2G08ABAEAWP@TSOP48.BIN | tail -n 5
00001880 ff ff
00 00 ff ff ff ff ff ff ff
ff ff ff
ff ff |................|
00001890 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
000018a0 01 8b bb 0a bb 54 88 50 7e 0e b9 9a c2 7b bd 40 |.....T.P~....{.@|
000018b0 dd 63 cb
9a e3 5a bc 70
65 ca 16 7a 50 dc 60 e0
|.c...Z.pe..zP.`.|
Still unchanged. How about the first "page" of the next block then
?
cawan% hexdump -C -v -n
$((2112*64+2112)) ./MT29F2G08ABAEAWP@TSOP48.BIN | \
tail -n 5
00021800 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
00021810 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
00021820 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff |................|
00021830 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
Well, this is a blank page that should be ignored. By grabbing a few samples
and make a conclusion is really not a good idea. Let's check it in
proper.
############################### check_const.py
###############################
input_file =
open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")
suspect_const = \
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
blank = \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
page_count = 0
diff_count = 0
while 1:
data = input_file.read(2112)
if len(data)
== 0:
break
oob_first_32_bytes =
data[2048:2048+32]
page_count
+= 1
if len(data)
== 2112 and oob_first_32_bytes != blank:
if oob_first_32_bytes != suspect_const:
diff_count
+= 1
print("diff_count: %d page_count:
%d\n" % (diff_count, page_count))
##################################### end
####################################
The output is,
cawan% python3.8 check_const.py
diff_count: 0 page_count: 131072
So, it is convincing enough to say that the first 32 bytes of OOB for all
the "page" are constant. Next, let's verify the second assumption
about the
second 32 bytes of OOB are ECCs or not. The ECC suspected portion of OOB
for the first 4 "page" are shown below.
00000820 f6 89 f7 79 e5 60 c9 e0 d6 e3 ed cb 9c b0
f9 f0 |...y.`..........|
00000830 1f da d4 a4 9c d4 1b e0 e0 90 cc 85 d8 d2
e2 80 |................|
00001060 8f ce
f4 8b 1c 26 38 00 bd 61 a0 c7 48 c4 d3
60 |.....&8..a..H..`|
00001070 d2 1b 46 ab 53 8f 41 f0 8d 18 2b 3b 8d 54 21 50 |..F.S.A...+;.T!P|
000018a0 01 8b bb 0a bb 54 88 50 7e 0e b9 9a c2 7b bd 40 |.....T.P~....{.@|
000018b0 dd 63 cb
9a e3 5a bc 70
65 ca 16 7a 50 dc 60 e0
|.c...Z.pe..zP.`.|
000020e0 43 a9 36 70 be b0 5e 90 1c 4f c1 ad 19 54
4d 20 |C.6p..^..O...TM |
000020f0 b8 6a 20 ba
32 c2 74 80 76 73 45 10 64 3e 38 c0 |.j .2.t.vsE.d>8.|
The output looks positive, and it provides extra information about how
the
ECC suspected portion of OOB going to be used by the system
implementation.
For each "page", it seems the 32 bytes of ECC suspected portion can
be
divided into four of 8 bytes each ECCs. The reason is the last 4 bits of
each 8 bytes of suspected ECC are always to be zero, as shown below.
f6 89 f7 79 e5 60 c9 e0
d6 e3 ed cb 9c b0 f9 f0
1f da d4 a4 9c d4 1b e0
e0 90 cc 85 d8 d2 e2 80
8f ce f4 8b 1c 26 38 00
bd 61 a0 c7 48 c4 d3 60
d2 1b 46 ab 53 8f 41 f0
8d 18 2b 3b 8d 54 21 50
01 8b bb 0a bb 54 88 50
7e 0e b9 9a c2 7b bd 40
dd 63 cb 9a e3 5a bc
70
65 ca 16 7a 50 dc 60 e0
43 a9 36 70 be b0 5e 90
1c 4f c1 ad 19 54 4d 20
b8 6a 20 ba 32 c2 74 80
76 73 45 10 64 3e 38 c0
^
0
Since a "page" comprises four ECCs, it is reasonable to deduce the
data
portion of a "page" with 2048 bytes in size can be divided into four
512 bytes of "sub-page". For each "sub-page", it is
protected by the
respective ECC, in sequence, as shown below.
f6 89 f7 79 e5 60 c9 e0 <- ECC of the 1st "sub-page" in 1st
"page"
d6 e3 ed cb 9c b0 f9 f0 <- ECC of the 2nd
"sub-page" in 1st "page"
1f da d4 a4 9c d4 1b e0 <- ECC of the 3rd "sub-page" in 1st
"page"
e0 90 cc 85 d8 d2 e2 80 <- ECC of the 4th "sub-page" in 1st
"page"
8f ce f4 8b 1c 26 38 00 <- ECC of the 1st
"sub-page" in 2nd "page"
bd 61 a0 c7 48 c4 d3 60 <- ECC of the 2st "sub-page" in 2nd
"page"
d2 1b 46 ab 53 8f 41 f0 <- ECC of the 3st "sub-page" in 2nd
"page"
8d 18 2b 3b 8d 54 21 50 <- ECC of the 4st "sub-page" in 2nd
"page"
01 8b bb 0a bb 54 88 50 <- ECC of the 1st "sub-page" in 3rd
"page"
7e 0e b9 9a c2 7b bd 40 <- ECC of the 2st "sub-page" in 3rd
"page"
dd 63 cb 9a e3 5a bc 70
<- ECC of the 3st "sub-page" in 3rd "page"
65 ca 16 7a 50 dc 60 e0 <- ECC of the 4st "sub-page" in 3rd
"page"
43 a9 36 70 be b0 5e 90 <- ECC of the 1st "sub-page" in 4th
"page"
1c 4f c1 ad 19 54 4d 20 <- ECC of the 2st
"sub-page" in 4th "page"
b8 6a 20 ba 32 c2 74 80 <- ECC of the 3st
"sub-page" in 4th "page"
76 73 45 10 64 3e 38 c0 <- ECC of the 4st "sub-page" in 4th
"page"
^
0
When saying the last 4 bits of each ECC is zero, it might indicate the
length of the ECC is 8*8=64-4=60 bits. As a side note, it is important
to note that the ECC length is normally expressed in bit form. Let's
get confirm to all the ECCs are 60-bits in size by checking the last
4 bits for each of them are always zero.
########################### check_ecc_last_4bit.py
###########################
input_file =
open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")
suspect_const = \
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
blank = \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
masking = b'\x00\x00\x00\x00\x00\x00\x00\x0f'
page_count = 0
diff_count = 0
while 1:
data = input_file.read(2112)
if len(data)
== 0:
break
oob_1st_32_bytes =
data[2048:2048+32]
oob_2nd_32_bytes =
data[2048+32:2048+64]
page_count
+= 1
if len(data)
== 2112 and oob_1st_32_bytes != blank:
for i
in range(4):
last_4_bits = bytes([a
& b for a, b in \
zip(oob_2nd_32_bytes[i*8:i*8+8], masking)])
if last_4_bits[7] !=
0:
diff_count += 1
print("diff_count: %d page_count:
%d\n" % (diff_count, page_count))
##################################### end
####################################
The output is,
cawan% python3.8 check_ecc_last_4bit.py
diff_count: 0 page_count: 131072
With such a convincing result, it is reasonable to say that the ECC length
is 60 bits.
Now, let's get a brief hacker overview of ECC algorithm. In general,
three types of implementation are normally being used: Hamming,
Reed-Solomon (RS), and Binary BCH. However, due to the Hamming code can
only correct a single bit of error, and the RS code require more code
redundancy for a given error correction, Binary BCH code is the mostly
used modern ECC implementation. Thus, the Binary BCH is assumed to be the
ECC implementation here. In addition, some special characteristics of
Binary BCH can help to further identifying the ECC implementation. The
first characteristic is for those data with all zero regardless its size,
the respective ECC in Binary BCH should also be all zero. Let's show it
in example by using bchlib. Let's be clear that all
the parameters are
just for demo at this stage, the actual parameters will be derived from
the analysis part by part. Let's go ahead to the first characteristic.
############################## test_bchlib_01.py
#############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\x00'*512)
ecc = bch.encode(data)
for i in ecc:
print("%X" % i, end='')
print("")
##################################### end
####################################
The bchlib is used for Binary BCH encoding and
decoding tasks. Two
parameters have to be specified to make it works, BCH_POLYNOMIAL and
BCH_BITS. The BCH_POLYNOMIAL is about the primitive polynomial going to
be used, and the BCH_BITS is about the maximum number of bit errors in
data that can be corrected by the ECC. All the details about these two
parameters will be discussed in the coming section of Binary BCH
implementation as it is crucial to uncover the secret association between
ECC and data. Now, let's get the first glance of bchlib
and study the
first characteristic of Binary BCH. The output of test_bchlib_01.py is
shown below.
cawan% python3.8 test_bchlib_01.py
0000000
The BCH encoded output of 512 bytes of zero is indeed 3.5 bytes of zero.
How about 512 bytes of 0xFF then ? Let's check.
############################## test_bchlib_02.py
#############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\xFF'*512)
ecc = bch.encode(data)
for i in ecc:
print("%X" % i, end='')
print("")
##################################### end
####################################
The output is,
cawan% python3.8 test_bchlib_02.py
D7EC33C6695380
The output is not all 0xFF and it makes sense. Otherwise, if 512 bytes
of 0xFF getting BCH encoded as 7 bytes of 0xFF, then it is not convenient
to differentiate from a blank "page". Now, let's proceed to the
second
characteristic about the zeros padding issues. The question now is what
happen if 32 bytes of zeros appended to the 512 bytes of 0xFF ? Let's
check it.
############################## test_bchlib_03.py
#############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\xFF'*512
+ b'\x00'*32)
ecc = bch.encode(data)
for i in ecc:
print("%X" % i, end='')
print("")
##################################### end
####################################
The output is,
cawan% python3.8 test_bchlib_03.py
BCE3B0AE479EB0
Well, it seems the zeros padded data is having different BCH encoded
output than the non-zeros padded data does, provided the data is not
all zeros. However, this is not the case of an inherent BCH encoder.
An inherent BCH encoder will generate exactly the same output for both
zeros padded data and non-zeros padded data. while such a characteristic
will cause some kind of discrepancy, such an issue should be avoided.
A common approach in overcoming such an issue caused by its inherent
characteristic is by reversing the bit order of the entire data, right
before getting it BCH encoded. So, it is reasonable to assume bchlib
should follow such an approach, but how to verify it ? Well, while
making such an assumption, for the data with 512 bytes of 0xFF appended
by 32 bytes of zeros, it means the actual data being BCH encoded by
bchlib is in fact 32 bytes of zeros being prepended
at the 512 bytes of
0xFF. So, if this is the case, the BCH encoded output of the zeros
prepended data should be the same with the non-zeros prepended data.
Let's verify it.
############################## test_bchlib_04.py
#############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data1 = bytearray(b'\x00'*32 + b'\xFF'*512)
ecc1 = bch.encode(data1)
data2 = bytearray(b'\xFF'*512)
ecc2 = bch.encode(data2)
print("Zeros Prepended:")
for i in ecc1:
print("%X" % i, end='')
print("")
print("Nothing Prepended:")
for i in ecc2:
print("%X" % i, end='')
print("")
##################################### end
####################################
As expected, both of the BCH encoded output are exactly the same, and the
output is shown below,
cawan% python3.8 test_bchlib_04.py
Zeros Prepended:
D7EC33C6695380
Nothing Prepended:
D7EC33C6695380
One important point should take note here. If the input data is bit order
reversed, the BCH encoded output should be in bit order reversed form also.
Thanks to bchlib for implementing this in default
mode. Now, another
question arises, is it possible to remain the bit order of the input data
which is going to be BCH encoded ? Yes, it is possible by performing bit
order reversing to the input data first before passing to the bchlib
encoder, and of course the BCH encoded output should perform bit order
reversing accordingly. Let's show it by example.
############################## test_bchlib_05.py
#############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\xFF'*511
+ b'\xAA')
data_reverse_bit = b''
for i in range(0, len(data)):
data_reverse_bit
+= bytes([int("{:08b}".format(data[i])[::-1],2)])
data_reverse_bit = data_reverse_bit[::-1]
ecc = bch.encode(data_reverse_bit)
ecc_reverse_bit = b''
for i in range(0, len(ecc)):
ecc_reverse_bit
+= bytes([int("{:08b}".format(ecc[i])[::-1],2)])
ecc_reverse_bit = ecc_reverse_bit[::-1]
for i in ecc_reverse_bit:
print("%X" % i, end='')
print("")
##################################### end ####################################
In this test_bchlib_05.py, the last bytes of the entire 512 bytes of data
input is purposely changed from 0xFF to 0xAA to avoid symmetricity of the
data ( 0b11111111 after bit order reversing is still 0b11111111 ). Now,
let's see the output.
cawan% python3.8 test_bchlib_05.py
72FFA2590ECDB
So, if everything correct, if 32 bytes of zeros appended to this 512 bytes
of data input and get BCH encoded, the output should be equal to
72FFA2590ECDB also. Let's verify it.
############################## test_bchlib_06.py
#############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\xFF'*511
+ b'\xAA' + b'\x00'*32)
data_reverse_bit = b''
for i in range(0, len(data)):
data_reverse_bit
+= bytes([int("{:08b}".format(data[i])[::-1],2)])
data_reverse_bit = data_reverse_bit[::-1]
ecc = bch.encode(data_reverse_bit)
ecc_reverse_bit = b''
for i in range(0, len(ecc)):
ecc_reverse_bit
+= bytes([int("{:08b}".format(ecc[i])[::-1],2)])
ecc_reverse_bit = ecc_reverse_bit[::-1]
for i in ecc_reverse_bit:
print("%X" % i, end='')
print("")
##################################### end
####################################
Perfect, the output is exactly as expected as shown below.
cawan% python3.8 test_bchlib_06.py
72FFA2590ECDB
That's enough for the "first glance" of bchlib
by studying some
characteristics of Binary BCH. To summarize the lesson learned from the
"first glance" in a hacker perspective, one should clear with two
points.
First, a data input with all zeros will generate all zeros output.
Second, a data input padded with whatever size of zeros will generate the
same output as no zeros being appended to the data input. Get back to the
NAND dump, the two points inspire a mind click. If the 60-bits BCH encoded
ECC exists somewhere in the form of all zeros, the 512 bytes of the data
in the respective "sub-page" should be in all zeros form too. If yes,
it
means the data being BCH encoded is either no padding added or all zeros
padding added. If not, it means the padding being added is not all zeros.
Sound confused ? Let's grab a "sub-page" in the NAND dump where
the
respective BCH encoded ECC is in all zeros form. It should be clear to
explain it by example.
########################### check_all_zeros_ecc.py
###########################
input_file =
open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")
oob_const =
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
zeros_ecc = b'\x00\x00\x00\x00\x00\x00\x00\x00'
page_cnt = 0
positive_cnt = 0
while 1:
data = input_file.read(2112)
if len(data)
== 0:
break
oob_1st_32_bytes =
data[2048:2048+32]
oob_2nd_32_bytes = data[2048+32:2048+32+32]
if len(data)
== 2112 and oob_1st_32_bytes == oob_const:
for i
in range(0, 4):
ecc
= oob_2nd_32_bytes[i*8:i*8+8]
if ecc
== zeros_ecc:
positive_cnt += 1
print("Page Num: %d, Address: 0x%X" % (page_cnt,
page_cnt*2112))
break
if positive_cnt
== 1:
break
page_cnt
+= 1
print("Completed")
##################################### end
####################################
Let's see any "page" can meet the condition, if yes, show the
"page"
number and its address of the first found item. The output is shown
below.
cawan% python3.8 check_all_zeros_ecc.py
Page Num: 256, Address: 0x84000
Completed
Nice, the first found item is at address 0x84000. Let's display the full
"page" in hex view.
cawan% hexdump -C -v -n
$((0x84000+2112)) MT29F2G08ABAEAWP@TSOP48.BIN \
| tail -n $((0x840/16+1))
00084000 76 3d f5 33 62 61 75 64 72 61 74 65 3d 31 31 35 |v=.3baudrate=115|
00084010 32 30 30 00 62 6f 6f 74 61 72 67 73 3d
6d 65 6d |200.bootargs=mem|
00084020 3d 36 34 4d 20 63 6f 6e 73 6f 6c 65 3d 74 74 79 |=64M console=tty|
00084030 53 30 2c 31 31 35 32 30 30 20 75 62 69 2e 6d 74 |S0,115200 ubi.mt|
00084040 64 3d 32 20 72 6f 6f 74 3d 75 62 69 30
3a 75 62 |d=2 root=ubi0:ub|
00084050 69 66 73 20 72 77 20 72 6f 6f 74 66 73 74
79 70 |ifs rw rootfstyp|
00084060 65 3d 75 62 69 66 73 20 69 6e 69 74 3d 2f 6c 69 |e=ubifs init=/li|
00084070 6e 75 78 72 63 00 62 6f 6f 74 63 6d 64 3d
6e 62 |nuxrc.bootcmd=nb|
00084080 6f 6f
74 2e 65 20 30 78 37 46 43 30 20 30 20
30 |oot.e
0x7FC0 0 0|
00084090 78 32 30 30 30 30 30 3b 20 62 6f 6f 74 6d
20 30 |x200000; bootm
0|
000840a0 78 37 46 43 30 00 62 6f 6f 74 64 65 6c 61
79 3d |x7FC0.bootdelay=|
000840b0 31 00 65 74 68 61 63 74 3d 65 6d 61 63 00 65 74 |1.ethact=emac.et|
000840c0 68 61 64 64 72 3d 30 30 3a 30 30 3a 30 30 3a 31 |haddr=00:00:00:1|
000840d0 31 3a 36 36 3a 38 38 00 69 70 61 64 64 72 3d 31 |1:66:88.ipaddr=1|
000840e0 39 32 2e 31 36 38 2e 38 2e 32 30 33 00 6d 74 64 |92.168.8.203.mtd|
000840f0 70 61 72 74 73 3d 6d 74 64 70 61 72 74 73 3d 6e |parts=mtdparts=n|
00084100 61 6e 64 30 3a 32 6d 28 75 2d 62 6f 6f 74
29 2c |and0:2m(u-boot),|
00084110 34 6d 28 6b 65 72 6e 65 6c 29 2c 31 36 6d 28 75 |4m(kernel),16m(u|
00084120 62 69 66 73 29 2c 33 32 6d 28 61 70 70 6c 69 63 |bifs),32m(applic|
00084130 61 74 69 6f 6e 29 2c 33 32 6d 28 62 61 63 6b 75 |ation),32m(backu|
00084140 70 29 2c 2d 28 64 61 74 61 29 00 6e 65 74 6d 61 |p),-(data).netma|
00084150 73 6b 3d 32 35 35 2e 32 35 35 2e 30 2e 30 00 72 |sk=255.255.0.0.r|
00084160 6f 6f
74 76 65 72 3d 4c 59 30 43 2d 30 36 30
31 |ootver=LY0C-0601|
00084170 2d 52 54 30 30 2d 48 30 53 30 2d 32 31 30 31 32 |-RT00-H0S0-21012|
00084180 37 2d 30 30 00 73 65 72 76 65 72 69 70 3d 31 39 |7-00.serverip=19|
00084190 32 2e 31 36 38 2e 38 2e 34 00 73 74 64 65 72 72 |2.168.8.4.stderr|
000841a0 3d 73 65 72 69 61 6c 00 73 74 64 69 6e 3d 73 65 |=serial.stdin=se|
000841b0 72 69 61 6c 00 73 74 64 6f 75 74 3d 73 65 72 69 |rial.stdout=seri|
000841c0 61 6c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |al..............|
000841d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000841e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000841f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084260 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084300 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084310 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084320 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084330 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084340 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084350 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084360 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084370 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084390 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084430 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084440 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084450 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084460 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084470 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084480 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084490 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084500 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084510 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084520 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084530 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084540 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084550 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084560 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084570 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084580 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084590 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084600 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084610 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084620 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084630 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084640 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084650 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084660 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084670 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084680 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084690 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084700 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084710 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084720 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084730 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084740 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084750 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084760 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084770 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084780 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084790 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084800 ff ff
00 00 ff ff ff ff ff ff ff
ff ff ff
ff ff |................|
00084810 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
00084820 3b 8d c6 e5 19 b2 24 50 00 00 00 00 00 00 00 00 |;.....$P........|
00084830 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084840
So, what can be deduced from this "page" ? Well, it is almost certain
the data portion of a "page" in 2048 bytes in size is divided into
four
parts with 512 bytes each, which I named it as "sub-page" at the
start
of this article. In this "page", the first "sub-page" is
started from
0x84000 to 0x841ff, which contains non-zeros data, with BCH encoded ECC
as 3b8dc6e519b22450. The following three "sub-page" are containing
all
zeros data, with BCH encoded ECC as all zeros, respectively. In other
words, the 512 bytes of zeros in each of these three "sub-page" are
either being BCH encoded directly, or being padded with a certain number
of zeros ONLY, in order to generate all zeros ECC. Hence, once the others
BCH encoding parameters are slowly unveiled in the discussion of the
following section, it becomes straightforward in recovering the secret
association between ECC and data. So, the second, third, and fourth
"sub-page" in a "page" are clear now, and it is usually
about the same
for all the other "page". However, the padding scheme of the first
"sub-page" is still uncertain yet, unless a "page" with
four all zeros
ECCs can be found. Let's try it.
####################### check_all_zeros_in_all_ecc.py
########################
input_file =
open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")
oob_const = \
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
zeros_ecc = \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
page_cnt = 0
while 1:
data = input_file.read(2112)
if len(data)
== 0:
break
oob_1st_32_bytes =
data[2048:2048+32]
oob_2nd_32_bytes =
data[2048+32:2048+32+32]
if len(data)
== 2112 and oob_1st_32_bytes == oob_const:
if oob_2nd_32_bytes[0:32] == zeros_ecc[0:32]:
print("Page Num: %d, Address: 0x%X" % (page_cnt,
page_cnt*2112))
break
page_cnt
+= 1
print("Completed")
##################################### end
####################################
Let's find for any expected "page". However, the output is
unexpected,
as shown below.
cawan% python3.8 check_all_zeros_in_all_ecc.py
Completed
Anyhow, just let go the unsolved part for now, we will get back later in
the next section. Now, let's have a brief hacker overview of Binary BCH
implementation, yes, solely from a hacker's perspective, not academic.
In general, the BCH codec needs a primitive polynomial in order to derive
a generator polynomial to be used for code generation. The Gallois
Field
order will determine the number of primitive polynomial that can be used
by the BCH codec. A polynomial can be represented by an integer or in bit
form binary. The set bits of the integer or the bit form binary represents
the coefficients of the given order of magnitude of the selected primitive
polynomial. Sound confused ? Let's have an example.
0x201B
|
V
0b0010000000011011
|
V
0b 0 0
1 0 0 0 0
0 0 0 0 1 1 0 1 1
^
^ ^ ^
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
|
| | |
| | | | | | | | | | | |
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0
For the hex representation of 0x201B, it can be represented in bit form
binary as 0b0010000000011011. Each of the set bit in this number will
reflect the coefficient of the given order of magnitude to form a primitive
polynomial. For the case of 0x201B, bit-0, bit-1, bit-3, bit-4, and bit-13
are the set bits. So, the primitive polynomial is
x^13 + x^4 + x^3 + x^1 + 1
Yes, each set bit position reflects the selected order of magnitude, and
the greatest set bit position is defined as the degree of the primitive
polynomial. Again, for the case of 0x201B, it is in degree 13. For most
of the times, the degree number is known as m to represent the Gallois
Field order, and so for the case of 0x201B, it can be expressed as m=13.
In order to protect a data in a certain number of size in the unit of bit,
the number should be less than 2^m. For example, to protect a data with
the size of 512 bytes, the data length in the unit of bit is 512*8=4096.
This number is normally known as k, and so, it is more appropriate to
write in the form of k=4096. So, number of 2^m should be greater than 4096,
then m should be greater than log(4096)/log(2)=12, and the m should be at
least 13. Again, for the case of 0x201B, since its m is 13, then it is
suitable to be used in protecting a data with 512 bytes in size. What is
the hex number of 0x201B in decimal ? It is 8219, sound familiar ? Yes,
it was being used in the "first glance" bchlib
section in defining the
variable BCH_POLYNOMIAL.
When talking about data protection, one must talk about the protection
strength. The protection strength is about if something went wrong in
data, then the data can tolerate up to how many bit of errors in order
to recover it back to the correct state. The strength is normally known
as t. So, when someone mentions t=4, it means the ECC can tolerate up to
4 bits of error. Alright, it is clear for m, k, and t
now. Let's proceed
to the discussion about the length of ECC, which is more commonly named
as the size of parity bits. For BCH, the size of parity bits is equal to
m*t. Thus, by given m=13, k=4096, and t=4, since 2^m=2^13=8192 which is
greater than k=4096, it is appropriate and no discrepancy at all to generate
BCH encoded ECC of parity bits with the size of m*t=13*4=52 bits. Remember
the ECC size being found from the NAND dump analysis in the previous part ?
Yes, it is 60-bits (8 bytes deduct the last 4 bits of zeros). Well, the
boring stuff is getting interesting now. Let's see what can be deduced with
this little clue. The data size to be protected is 512 bytes, which is
4096 bits. The m should be at least 13 and so 2^m=2^13=8192, which is
sufficient to protect the 4096 bits of data. As the number of parity bits
is 60, the respective factors are 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30,
and 60. By given m*t=60, and m>=13, the possible combination of (m, t)
are (15, 4), (20, 3), (60, 1). While t=4 is a common approach for majority
of the BCH implementation of ECC, the combination of m=15 and t=4 is most
probably. The others two combinations of (20, 3) and (60, 1) are not only
unrealistic, but also terribly overkilled. At this stage, by assuming m=15
and t=4, which primitive polynomial should be selected ? Let's refer to
the primitive polynomial list as stated in [4]. For degree 15, the
candidates are shown below.
x^15 + x^1 + 1
x^15 + x^4 + 1
x^15 + x^7 + 1
x^15 + x^7 + x^6 + x^3 + x^2 + x^1 + 1
x^15 + x^10 + x^5 + x^1 + 1
x^15 + x^10 + x^5 + x^4 + 1
x^15 + x^10 + x^5 + x^4 + x^2 + x^1 + 1
x^15 + x^10 + x^9 + x^7 + x^5 + x^3 + 1
x^15 + x^10 + x^9 + x^8 + x^5 + x^3 + 1
x^15 + x^11 + x^7 + x^6 + x^2 + x^1 + 1
x^15 + x^12 + x^3 + x^1 + 1
x^15 + x^12 + x^5 + x^4 + x^3 + x^2 + 1
x^15 + x^12 + x^11 + x^8 + x^7 + x^6 + x^4 + x^2 + 1
x^15 + x^14 + x^13 + x^12 + x^11 + x^10 + x^9 + x^8 + x^7 + x^6 + \
x^5 + x^4 + x^3 + x^2+1
Well, the first candidate should be selected, which is
x^15 + x^1 + 1
The polynomial can be represented in binary bit form as mentioned earlier,
which is,
0b1000000000000011
In hex, it is 0x8003, in decimal it is 32771. So, get back to the bchlib,
the BCH_POLYNOMIAL and BCH_BITS, both of them should be set as 32771 and 4,
respectively.
Now, by assuming nobody will naive enough to do BCH encoding without
performing bit order reversing of the entire data input first, let's try
the BCH encoding without any padding for the first "page".
###################### bch_encoding_without_padding.py
#######################
import bchlib
import binascii
BCH_POLYNOMIAL = 32771
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
input_file =
open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")
page = input_file.read(2112)
ECC = page[2048+32:2048+32+32]
for i in range(0, 4):
ecc_generated
= bch.encode(page[i*512:i*512+512])
print("\nSub-page:
%d" % i)
print("ECC Ori:", end='
')
print(ECC[i*8:i*8+8].hex().upper())
print("ECC Generated:",
end=' ')
print(ecc_generated.hex().upper())
if ECC[i*8:i*8+8]
== ecc_generated:
print("Match
!")
else:
print("Wrong
!")
print("\nCompleted")
##################################### end
####################################
The output is shown below.
cawan% python3.8
bch_encoding_without_padding.py
Sub-page: 0
ECC Ori: F689F779E560C9E0
ECC Generated: 8DE136AAF3E03F90
Wrong !
Sub-page: 1
ECC Ori: D6E3EDCB9CB0F9F0
ECC Generated: 6C6CF320EFAD8660
Wrong !
Sub-page: 2
ECC Ori: 1FDAD4A49CD41BE0
ECC Generated: 1058EAC213313D70
Wrong !
Sub-page: 3
ECC Ori: E090CC85D8D2E280
ECC Generated: B36A94B537E14BA0
Wrong !
Completed
None of the four "sub-page" generate the correct ECC. So, the
"sub-page"
should be padded by a certain number of zero before getting BCH encoded.
Let's try to do BCH encoding by padding the "sub-page" from 1 to 32
bytes
of zeros.
#################### bch_encoding_with_zeros_padding.py
######################
import bchlib
import binascii
BCH_POLYNOMIAL = 32771
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
input_file =
open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")
page = input_file.read(2112)
ECC = page[2048+32:2048+32+32]
found_flag = 0
for i in range(0, 4):
print("\nSub-page:
%d" % i)
print("ECC Ori:", end='
')
print(ECC[i*8:i*8+8].hex().upper())
for j in range(1, 33):
padding = b'\x00'*j
ecc_generated
= bch.encode(page[i*512:i*512+512]+padding)
if ECC[i*8:i*8+8]
== ecc_generated:
print("ECC
Generated:", end=' ')
print(ecc_generated.hex().upper())
print("Match
!", end=' ')
print("Zeros padded
number: %d" % j)
found_flag
= 1
break
if found_flag
== 0:
print("Wrong
!")
found_flag
= 0
print("\nCompleted")
#################################### end
####################################
Let's go and run the check. Hola, the output is interesting, as shown
below.
cawan% python3.8
bch_encoding_with_zeros_padding.py
Sub-page: 0
ECC Ori: F689F779E560C9E0
Wrong !
Sub-page: 1
ECC Ori: D6E3EDCB9CB0F9F0
ECC Generated: D6E3EDCB9CB0F9F0
Match ! Zeros padded number: 24
Sub-page: 2
ECC Ori: 1FDAD4A49CD41BE0
ECC Generated: 1FDAD4A49CD41BE0
Match ! Zeros padded number: 24
Sub-page: 3
ECC Ori: E090CC85D8D2E280
ECC Generated: E090CC85D8D2E280
Match ! Zeros padded number: 24
Completed
So, for those four "sub-page" in a "page", other than the
first "sub-page",
the second, third, and fourth "sub-page" are padded with 24 bytes of
zeros
before being BCH encoded in order to generate the correct ECC,
respectively.
However, the first "sub-page" is still in cryptic, which need to
tweak a bit.
Since the rest of the "sub-page" are padded with 24 bytes of zeros,
it is
very likely the first "sub-page" is padded with 24 bytes of non-zeros
data
then. It should be something related to some kind of "metadata" which
is
descriptive to the "page" itself. Remember the first 32 bytes of OOB
?
Let's check it again.
cawan% hexdump -C -v -n
$((2112-32)) MT29F2G08ABAEAWP@TSOP48.BIN | tail -n 3
00000800 ff ff
00 00 ff ff ff ff ff ff ff
ff ff ff
ff ff |................|
00000810 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
00000820
The two bytes of zeros at 0x802 and 0x803 are a little bit strange. So,
is it possible for the first few bytes of the 24 bytes of zeros padding
are replaced by some bytes from here ? Let's try to replace the 24 bytes
of zeros padding byte by byte, until the entire 24 bytes of padding
become
ffff0000ffffffffffffffffffffffffffffffffffffffff
Let's try it.
####################### bch_encoding_of_1st_subpage.py
#######################
import bchlib
import binascii
BCH_POLYNOMIAL = 32771
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
input_file = open("./MT29F2G08ABAEAWP@TSOP48.BIN",
"rb")
page = input_file.read(2112)
subpage = page[0:512]
ECC = page[2048+32:2048+32+8]
paddingx = \
b'\xFF\xFF\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' + \
b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF'
padding0 = \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00'
data_input = subpage + padding0
data_input = bytearray(data_input)
for i in range(0, 24):
data_input[512+i]
= paddingx[i]
ecc_generated
= bch.encode(data_input)
if ecc_generated
== ECC:
print("Match
!")
print("Padding:",
end=' ')
print(data_input[512:].hex().upper())
break
print("\nCompleted")
#################################### end
####################################
Let's run it. Bingo, the padding pattern found, as shown below.
cawan% python3.8 bch_encoding_of_1st_subpage.py
Match !
Padding: FFFF00000000000000000000000000000000000000000000
Completed
3 - Bit Errors Fixing with ECC
Perfect. Now, the secret association between ECC and data is fully
unveiled. As a conclusion, for each of the "sub-page" in a
"page", the
first "sub-page" has to be padded by 24 bytes of padding which
comprise
2 bytes of 0xFF following by 22 bytes of zeros, before getting BCH encoded
to generate correct ECC. For the case of second, third, and fourth
"sub-page", only a 24 bytes of all zeros padding is needed to
generate
correct ECC, respectively. So, by doing the BCH decoding in the similar
manner to all the "page" of the entire NAND dump, all the bit errors
are
getting fixed perfectly. After that, all the 64 bytes OOB in each
"page"
should be removed and generating a new NAND dump with contiguous data in
"page" by "page" without any bit errors, and I rename it as
cawan_output.bin, as shown below.
####################### NAND_dump_fix_bit_erros_ecc.py
#######################
import bchlib
BCH_POLYNOMIAL = 32771
BCH_BITS = 4
input_file =
open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")
output_file = open("./cawan_output.bin",
"wb")
pad_sub0 = \
b'\xFF\xFF\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00'
pad_subx = \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00'
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
count = 0
error_cnt = 0
while 1:
page = input_file.read(2112)
if len(page)
!= 2112:
break
for i in
range(0, 4):
data, ecc
= page[512*i:512*i+512], page[2048+32+i*8:2048+32+i*8+8]
if i
== 0:
data_padded
= data + pad_sub0
else:
data_padded
= data + pad_subx
data_padded
= bytearray(data_padded)
bitflips = bch.decode_inplace(data_padded, ecc)
if bitflips == 0:
output_file.write(data_padded[:512])
elif
bitflips > 0:
error_cnt
+= 1
output_file.write(data_padded[:512])
elif
bitflips == -1:
output_file.write(data_padded[:512])
count += 1
print("Sub-page with error count: %d\n" % error_cnt)
print("Completed.")
#################################### end
####################################
Well, there are 20 "sub-page" with bit errors have being fixed with
the ECC, as shown below.
cawan% python3.8 NAND_dump_fix_bit_erros_ecc.py
Sub-page with error count: 20
Completed.
By armed with knowledge, any suitable common tool can be weaponized for
hacking purposes. Don't be silly and get stubborn in believing a
proprietary, special, commercial, or even an automated tool can work as
expected without requiring a single knowledge in the field. So, the
firmware is ready right now, let's proceed to the firmware analysis.
4 - UBI Image Analysis
As a common approach, let's begin with binwalk and
expect for gold strikes
or money grow on tree, or both. Let's see the binwalk
output as shown below.
cawan% binwalk cawan_output.bin
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
963584 0xEB400 CRC32 polynomial table, little
endian
966688 0xEC020 CRC32 polynomial table, little
endian
970868 0xED074 LZO compressed data
2097152 0x200000 uImage
header, header size: ...
2097216 0x200040 Linux kernel ARM boot executable zImage ...
2115956 0x204974 gzip
compressed data, maximum compression, ...
6291456 0x600000 UBI erase count header, version: 1,
...
It looks interesting. As what is stated in the title of this article,
only the UBI image is going to be analyzed. The full
description of
the UBI header being detected at address 0x600000 is shown below.
UBI erase count header,
version: 1,
EC: 0x1,
VID header offset: 0x800,
data offset: 0x1000
The header really makes sense with UBI magic at 0x600000, version 1,
the erase count is 1, which mean it is a new NAND flash, or at least
it is just being reformatted. After that, the volume ID header is 0x800
or 2048 in decimal away from 0x600000, which is a common approach for
NAND flash. One important thing to emphasize here. The newly generated
NAND dump is defined as logical NAND dump which is OOB removed and the
size of each "page" is 2048 bytes. So, it is really a common approach
in
locating the volume ID header one "page" away from the UBI header.
Then,
the actual data is 0x1000 or 4096 in decimal away from the 0x600000,
in other words it is another one "page" away from the volume ID
header.
This is also a common approach for NAND flash. So, there is something
as a lunch ? Let's try to extract it with binwalk by
passing in the well
known parameters, -Me. The lengthy output seems convincing. Let's get
into the directory hosting the extracted files, as shown below.
cawan% cd _cawan_output.bin.extracted
cawan% ls
204974 _204974.extracted 600000.ubi
ED074.lzo ubifs-root
As ubifs-root directory is generated, let's get into
the directory.
cawan% cd ubifs-root
cawan% ls
1941946494 3823591600
Another two directory found. Let's check each directory by using tree
command.
cawan% tree -L 2 1941946494
1941946494
ubifs
bin
dev
etc
home
lib
linuxrc -> bin/busybox
mnt
proc
root
sbin
sys
tmp
usr
var
work
15 directories, 1 file
cawan% tree -L 3 3823591600
3823591600
app
1 directory, 0 files
Well, it seems the file system is extracted in the directory of
1941946494. However, for 3823591600, it is an empty directory.
Let's go further.
cawan% cd 1941946494
cawan% cd ubifs
cawan% ls
bin dev
etc home lib linuxrc mnt proc root sbin sys tmp usr \
var work
cawan% cd etc
cawan% ls
fstab
HOSTNAME inittab pointercal profile~
ts.conf
group inetd.conf networks
ppp
services vsftpd.conf
gshadow init.d
passwd profile shadow
cawan% cat fstab
cawan% ls -la fstab
-rw-rw-r-- 1 user user 186 Mar 30 2015
fstab
cawan% cat fstab | xxd
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000b0: 0000 0000 0000 0000 0000 ..........
Well, must be something wrong to the file system extraction. It seems
the free lunch is not really free. Let's go further to find the reason ?
Don't get into mischief, this is really not in the right track for a
hardcore hacker. While talking about analysis, each step of the entire
process should be strictly under control, trackable and explainable,
and it applies to firmware analysis too. Let's start from the beginning
with dd again and craft the UBI image out manually.
cawan% dd if=./cawan_output.bin
of=./ubi.bin bs=1 skip=$((0x600000))
262144000+0 records in
262144000+0 records out
262144000 bytes (262 MB, 250 MiB) copied, 281.069 s, 933 kB/s
cawan% file ubi.bin
ubi.bin: UBI image, version 1
It really takes a while to generate ubi.bin. Now,
let's verify the UBI
header, volume ID header, and the start of data in hex view.
cawan% hexdump -C -n
$((2048*3)) ./600000.ubi
00000000 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00 01 |UBI#............|
00000010 00 00 08 00 00 00 10 00 73 bf c0 7e 00 00 00 00 |........s..~....|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 01 9f 6b b3 |..............k.|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000800 ff ff
ff ff ff
ff ff ff ff ff ff ff
ff ff ff
ff
|................|
*
00001800
Let's interpret the UBI header with its data structure as shown below.
struct ubi_ec_hdr {
__be32 magic;
__u8 version;
__u8 padding1[3];
__be64 ec;
__be32 vid_hdr_offset;
__be32 data_offset;
__be32 image_seq;
__u8 padding2[32];
__be32 hdr_crc;
}
The header magic is "UBI#" with 4 bytes in size, following by the
version number as 1 which is 1 byte in size. After the 3 bytes of
padding, then it is so called Erase-Counter with abbreviation as ec
which indicate how many times the block has been erased. A little bit
of background knowledge about this which might not be hacker friendly.
The NAND flash storage has a certain number of lifespan. For each time
of erase operation to the same place in the flash, it will reduce the
lifespan. So, once the lifespan count reached, the place becomes useless.
UBI divides the NAND flash storage into "block", which comprise a
number
of "page". For the case of MT29F2G08ABAEAWP, a "block"
comprises 64
"page" where for each "page" is 2048 bytes in size. So, it
is crucial
in monitoring the used count of all the "block" in order to avoid
data
loss. Hence, while the used count of a "block" reached a certain
number
of triggering level, the entire data in the "block" has to be
relocated
to another "block" which is in good condition. While the relocation
of
the physical "block" will affect the order or sequence of the
"block",
it needs some kind of abstraction to manage the physical "block" in
the
logical way. By ensuring the order or sequence of logical "block" in
high
level, the logical "block" can particularly being remapped to the
appropriate physical "block" accordingly. Such an abstraction is
formally
known as wear-leveling. Well, the so called used
count is identical to
erase count in UBI, or worn count in wear-leveling.
UBI is responsible
to provide such a wear-leveling mechanism by managing
the logical "block"
in the most appropriate way. Let's get back to the 8 bytes of ec item of
the UBI header. The ec is 1 means it is getting
formatted for 1 time.
After the ec, it is 4 bytes of volume ID offset from
the begining of
UBI header, it is 0x800, which is about 1 "page" size. The volume
ID
is followed by data offset in 4 bytes size, it is 0x1000, which is another
1 "page" from the volume ID. Next to the data offset is another 4
bytes to
represent image sequence for identifing the
respective UBI block is
belonging to which UBIFS for file system construction. So, the UBIFS is
indeed the actual file system that a hacker should focus on. After that,
there are 32 bytes of padding, and at last, it is the UBI header CRC
checksum in 4 bytes.
Now, let's check how many UBIFS exist in the UBI image.
############################ check_ubifs_count.py
############################
input_file = open("./600000.ubi", "rb")
count = 0
img_seq = b''
tmp_seq = b''
while 1:
block = input_file.read(2048*64)
if len(block)
!= 2048*64:
break
if block[0:4] ==
b'\x55\x42\x49\x23':
img_seq
= block[24:28]
if img_seq
!= tmp_seq:
print("0x",
end='')
print(img_seq.hex().upper(), end=' -> ')
print("%d" %
int(img_seq.hex(),16))
tmp_seq
= img_seq
count += 1
print("\nCompleted.")
#################################### end
####################################
The output is shown below.
cawan%% python3.8 check_ubifs_count.py
0x73BFC07E -> 1941946494
0xE3E760B0 -> 3823591600
0x9F61AB77 -> 2673978231
0x49F558F2 -> 1240815858
Completed.
Sound familiar ? Yes, definitely. 1941946494 and 3823591600 were being
used by binwalk to name the folders to host extracted
files. How about
the another two ? That's definitely something wrong in the process while
binwalk extracting the UBI image. Before proceed
further, let's try to
estimate the size of data in used in the UBI image. One thing to clarify
first. Whenever an UBI erase block is being in used, it should come with
valid volume ID header, and the magic is "UBI!". Please note that the
term "UBI erase block" is in fact the formal term of logical UBI
block.
############################ check_data_inuse.py
#############################
input_file = open("./600000.ubi", "rb")
data_inuse = 0
UBI_hdr = b'\x55\x42\x49\x23'
VID_hdr = b'\x55\x42\x49\x21'
while 1:
block = input_file.read(2048*64)
if len(block)
!= 2048*64:
break
if block[0:4] == UBI_hdr
and block[2048:2048+4] == VID_hdr:
data_inuse
+= 2048*64
print("Data size in use: %d" % data_inuse)
print("\nCompleted.")
#################################### end
####################################
The output is shown below.
cawan% python3.8 check_data_inuse.py
Data size in use: 40239104
Completed.
Nice, it is about 40 MB in size, including some extra space which is
hard to estimate precisely. Now, it is time to talk about how to
extract the UBIFS from UBI image. As it is about the matter of
re-arranging the UBI erase blocks according to the image_seq
number,
it is no harm to try with a well known toolkit, UBI
Reader. Let's see
the result.
cawan% ubireader_extract_images
ubi.bin
cawan% ls
cawan_output.bin
ubi.bin
ubifs-root
cawan% cd ubifs-root
cawan% ls
ubi.bin
cawan% cd ubi.bin
cawan% ls
img-1240815858_vol-data.ubifs
img-2673978231_vol-backup.ubifs
img-1941946494_vol-ubifs.ubifs
img-3823591600_vol-app.ubifs
cawan% ls -la
total 145212
drwxrwxr-x 2 user user 4096 May 29 16:46 .
drwxrwxr-x 3 user user 4096 May 29 16:46 ..
-rw-rw-r-- 1 user user 100438016 May 29 16:46
img-1240815858_vol-data.ubifs
-rw-rw-r-- 1 user user 11935744 May 29
16:46 img-1941946494_vol-ubifs.ubifs
-rw-rw-r-- 1 user user 27299840 May 29
16:46 img-2673978231_vol-backup.ubifs
-rw-rw-r-- 1 user user 9015296 May 29
16:46 img-3823591600_vol-app.ubifs
Cool. No error prompt at all and 4 UBIFS getting extracted. Remember
the estimated data in use size is about 40 MB ? It is reasonable to
assume the UBIFS with the name of img-1240815858_vol-data.ubifs is
something wrong. For the rest of 3 UBIFS should be in good condition
because their total size is about 40 MB plus, estimation.
Let's try to use the UBI Reader toolkit again to extract files from UBIFS.
Let's start from img-1941946494_vol-ubifs.ubifs as shown below.
cawan% ubireader_extract_files
img-1941946494_vol-ubifs.ubifs
Extracting files to: ubifs-root
decompress Warn: LZO Error: EResult.LookbehindOverrun
_process_reg_file Warn: inode
num:693 path:<...> :can't concat NoneType to bytearray
decompress Warn: LZO Error: EResult.InputOverrun
_process_reg_file Warn: inode
num:592 path:<...> :can't concat NoneType to bytearray
decompress Warn: LZO Error: EResult.LookbehindOverrun
_process_reg_file Warn: inode
num:587 path:<...> :can't concat NoneType to bytearray
decompress Warn: LZO Error: EResult.InputOverrun
...
...
...
cawan% ls
img-1240815858_vol-data.ubifs
img-2673978231_vol-backup.ubifs ubifs-root
img-1941946494_vol-ubifs.ubifs
img-3823591600_vol-app.ubifs
cawan% cd ubifs-root
cawan% ls
bin dev
etc home lib linuxrc mnt proc root sbin sys tmp usr \
var work
After getting a huge number of error prompt, it seems a file system is
generated. Is that the same thing as what was being generated by binwalk
earlier ? Let's check.
cawan% cd etc
cawan% cat fstab
cawan% ls -la fstab
-rw-rw-r-- 1 user user 186 Mar 30 2015
fstab
cawan% cat fstab | xxd
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000b0: 0000 0000 0000 0000 0000 ..........
Damn, it is indeed the same thing. It seems the ubireader_extract_files
is unable to fully interpret the UBIFS and generate correct files. How
about the others two UBIFS ? Let's check.
cawan% ubireader_extract_files
img-2673978231_vol-backup.ubifs
Extracting files to: ubifs-root
index Fatal: LEB: 110 at 13998336, Node size smaller than expected.
cawan% ubireader_extract_files
img-3823591600_vol-app.ubifs
Extracting files to: ubifs-root
index Fatal: LEB: 58 at 7461120, Node size smaller than expected.
Sorry, fatal error this time, nothing generated. Since this is the NAND
dump from a real device which is fully functional, and all the bit errors
have being fixed, the UBIFS should work accordingly. It should proceed
in another route by emulating the NAND chip to work associated with MTD
by using nandsim. In most of the hacking literature,
while talking about
nandsim, a standard conventional approach is dd the
entire UBI image into
the emulated MTD device by nandsim, and modprobe the ubi driver with some
parameters, and the ubi driver is on its own to deal with the UBI image
blob. Let's put a few words of comment about this. As what mentioned
earlier, UBI erase block is purposely for wear-leveling
implementation
in UBI layer. Since the UBI erase block is in logical form, they are
normally not in sequence physically, which is the case of the NAND dump.
So, instead of relying the UBI driver to work extra for block remapping
operation, which might have high chance in causing errors in all the
regards under emulation mode, it is better to pre-process the UBI image
in offline mode by using ubireader_extract_images
first. The output of
ubireader_extract_images is already in UBIFS form,
which is the actual
file system like squashfs, jffs2, yaffs2, or CRAMFS
do. In other words,
by dealing with UBIFS directly, the chance of getting errors will get
minimized. Anyway, it is no harm to go with the standard conventional
approach first. Let's get started to grab the low-hanging fruit. In order
to emulate a NAND chip, one should get know the ID codes of the chip.
By referring to the datasheet of MT29F2G08ABAEAWP, the first 4 bytes are
0x2c, 0xda, 0x90, and 0x95. With such an info, it is ready for nandsim.
cawan% sudo modprobe nandsim first_id_byte=0x2c second_id_byte=0xda
third_id_byte=0x90 fourth_id_byte=0x95
cawan% cat /proc/mtd
dev: size erasesize name
mtd0: 10000000 00020000 "NAND simulator partition 0"
cawan% sudo mtdinfo -a
Count of MTD devices: 1
Present MTD devices:
mtd0
Sysfs interface supported: yes
mtd0
Name: NAND
simulator partition 0
Type: nand
Eraseblock size: 131072 bytes, 128.0 KiB
Amount of eraseblocks: 2048 (268435456 bytes, 256.0
MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size: 512
bytes
OOB size: 64
bytes
Character device major/minor:
90:0
Bad blocks are allowed:
true
Device is writable:
true
Since it is assumed as low-hanging fruit for now, just ignore the
parameters shown first. Now, let's dd the UBI image into /dev/mtd0.
cawan% sudo dd if=ubi.bin of=/dev/mtd0 bs=2048
128000+0 records in
128000+0 records out
262144000 bytes (262 MB, 250 MiB) copied, 2.7339 s, 95.9 MB/s
Done. Now, modprobe the ubi driver.
cawan% sudo modprobe ubi mtd=0,2048
modprobe: ERROR: could not insert 'ubi': Invalid
argument
Sorry, the low-hanging fruit is in fact not so low for this NAND dump.
Let's proceed in the proper way as what being proposed earlier. Let's
start again from the beginning, by rmmod the nandsim first and modprobe
the nandsim again.
cawan% sudo rmmod nandsim
cawan% sudo modprobe nandsim first_id_byte=0x2c second_id_byte=0xda
third_id_byte=0x90
fourth_id_byte=0x95
Well, nothing special here. The output of mtdinfo -a
is nothing special
also because it is just about the parameters of MT29F2G08ABAEAWP. The
only thing that need to make sure is the /dev/mtd0 is created. After
that,
use ubiformat with correct parameters to bring up the
emulated NAND flash
as UBI compatible with the UBI specification being used in the NAND dump,
as shown below.
cawan% sudo ubiformat -s 2048 -O 2048 /dev/mtd0
ubiformat: mtd0 (nand),
size 268435456 bytes (256.0 MiB), \
2048 eraseblocks of 131072 bytes (128.0 KiB), min.
I/O size 2048 bytes
libscan: scanning eraseblock
2047 -- 100 % complete
ubiformat: 2048 eraseblocks
are supposedly empty
ubiformat: formatting eraseblock
2047 -- 100 % complete
Let's explain the two compulsory input parameters of ubiformat.
The -s is
also known as sub-page-size, which is the minimum i/o
unit used for UBI
headers. By setting it as 2048, it prevents the UBI from dividing the
entire 2048 bytes into smaller unit of sub-page. Next, the -O is volume
ID
header offset. By setting it as 2048, it means the volume ID header
should
start 1 page or 2048 bytes away from the start of the UBI erase block.
Please note that without specifying these two parameters with the correct
figures, or leave everything by default, it will end-up with errors in
the
following steps. Let's proceed further to modprobe
the UBI driver.
cawan% sudo modprobe ubi
cawan%
No error prompt, just assume it is succeeded. Now, use ubiattach
to create
a UBI device file which work associated with /dev/mtd0, as shown below.
cawan% sudo ubiattach -p /dev/mtd0 -O 2048
UBI device number 0, \
total 2048 LEBs (260046848 bytes, 248.0 MiB), \
available 2002 LEBs (254205952 bytes, 242.4 MiB), \
LEB size 126976 bytes (124.0 KiB)
Again, the input parameter of -O 2048 is crucial to specify the volume
ID header offset as 2048 bytes away from the UBI eraseblock,
which is
similar to ubiformat. It is extremely important to
make sure the Logical
Eraseblock (LEB) size is 126976 bytes. Why ? Because
a eraseblock size
is 2048*64=131072, and after deducting 2 pages with the size of 2048
bytes each (one for UBI header and one for volume ID header) from it,
then the LEB size becomes 131072-2048-2048=126976. So, they match each
other. Otherwise, it will end-up with errors in the following step also.
A new UBI device file is created as /dev/ubi0, which can check its
details by using ubinfo, as shown below.
cawan% sudo ubinfo /dev/ubi0 -a
ubi0
Volumes count:
0
Logical eraseblock size: 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks: 2048 (260046848 bytes, 248.0 MiB)
Amount of available logical eraseblocks: 2002
(254205952 bytes, 242.4 MiB)
Maximum count of volumes
128
Count of bad physical eraseblocks: 0
Count of reserved physical eraseblocks: 40
Current maximum erase counter value:
0
Minimum input/output unit size:
2048 bytes
Character device major/minor:
237:0
Now, a UBI environment which is having exactly the same specification
with the UBI image in the NAND dump is getting ready. Let's create a
volume with sufficient storage to host the UBIFS being created by
ubireader_extract_images, as shown below.
cawan% sudo ubimkvol -N volume1 -s 50MiB /dev/ubi0
Volume ID 0, \
size 413 LEBs (52441088 bytes, 50.0 MiB), \
LEB size 126976 bytes (124.0 KiB), dynamic, name "volume1", alignment
1
Well, a new volume named as "volume1" with 50 MB in size has been
created successfully, together with a new device file as /dev/ubi0_0,
by using ubimkvol. Now, it is time to let volume1 to
host a UBIFS by
using ubiupdatevol. Let's start with
img-1941946494_vol-ubifs.ubifs
first, as shown below.
cawan% ls -la
total 145216
drwxrwxr-x 3 user user 4096 May 30 01:40 .
drwxrwxr-x 3 user user 4096 May 29 16:46 ..
-rw-rw-r-- 1 user user 100438016 May 29 16:46 img-1240815858_vol-data.ubifs
-rw-rw-r-- 1 user user 11935744 May 29
16:46 img-1941946494_vol-ubifs.ubifs
-rw-rw-r-- 1 user user 27299840 May 29
16:46 img-2673978231_vol-backup.ubifs
-rw-rw-r-- 1 user user 9015296 May 29
16:46 img-3823591600_vol-app.ubifs
drwxrwxr-x 2 user user 4096 May 30 01:40 ubifs-root
cawan% sudo ubiupdatevol /dev/ubi0_0
img-1941946494_vol-ubifs.ubifs
cawan%
5 - Firmware Extraction
Everything works perfectly without any single error so far. Let's see
the low-hanging fruit which is not so low is available now or not.
cawan% mkdir /tmp/nand
cawan% sudo mount -t ubifs /dev/ubi0_0 /tmp/nand
cawan% cd /tmp/nand
cawan% ls
bin dev
etc home lib linuxrc mnt proc root sbin sys tmp usr \
var work
Hopefully this is not the same thing as what ubireader_extract_files
generates in the previous section. Let's verify it.
cawan% cd etc
cawan% cat fstab
proc /proc proc defaults
0 0
none /var/shm shm defaults 0
0
sysfs /sys
sysfs
defaults 0 0
none /tmp tmpfs defaults 0
0
what an amazing moment. Let's try with another two UBIFS.
cawan% sudo umount /tmp/nand
cawan% sudo ubiupdatevol /dev/ubi0_0
img-2673978231_vol-backup.ubifs
cawan% sudo mount -t ubifs /dev/ubi0_0 /tmp/nand
cawan% ls /tmp/nand
14x8.hzk dat.ini flat_backup libplat.so
ParaAutoNet.db
ParamUniq.db
acmet driver_gwzd.ko gsmMuxd lyzd ParamMeter.db ppp
check.ini factory icons.bmp manuf.xin ParamOther.db seting.ini
chs.bin filecheck
libacmet.so metproto.so ParamTerm.db startup.sh
cawan% sudo umount /tmp/nand
cawan% sudo ubiupdatevol /dev/ubi0_0 img-3823591600_vol-app.ubifs
cawan% sudo mount -t ubifs /dev/ubi0_0 /tmp/nand
cawan% ls /tmp/nand
14x8.hzk driver_gwzd.ko libacmet.so
manuf.xin
startup.sh
check.ini filecheck libplat.so metproto.so
tmt_info.log
chs.bin gsmMuxd lyzd ppp updateinfo.xin
dat.ini icons.bmp lyzd.xzip seting.ini
6 - Conclusion
So, as a conclusion, the entire file system hosting in three different
UBIFS
have been fully extracted successfully.
Happy hacking, and keep hacking.
References:-
[1] MT29F2G08ABAEAWP Data Sheet, https://datasheet.lcsc.com/
lcsc/1811032117_Micron-Tech-MT29F2G08ABAEAWP-E_C110895.pdf
[2] DumpFlash Tool,
https://github.com/ohjeongwook/dumpflash
[3] python-bchlib,
https://github.com/jkent/python-bchlib
[4] Primitive Polynomial List,
https://www.partow.net/programming/polynomials/
index.html
[5] UBI Header Structure,
https://kernel.googlesource.com/pub/scm/linux/kernel/
git/rw/mtd-utils/+/refs/heads/master/include/mtd/ubi-media.h
cawan's blog
Sunday, June 25, 2023
NAND Dump Analysis, Bit Errors Fixing with ECC, UBI Image Analysis, and Firmware Extraction Demystified
Subscribe to:
Posts (Atom)