Sunday, June 25, 2023

NAND Dump Analysis, Bit Errors Fixing with ECC, UBI Image Analysis, and Firmware Extraction Demystified

NAND Dump Analysis, Bit Errors Fixing with ECC, UBI Image Analysis,
and Firmware Extraction Demystified

by cawan (cawan[at]ieee.org or chuiyewleong[at]hotmail.com)

on 25/05/2023

1 - Introduction
2 - NAND Dump Analysis
3 - Bit Errors Fixing with ECC
4 - UBI Image Analysis
5 - Firmware Extraction
6 - Conclusion


1 - Introduction

This is a paper about how a NAND dump to be processed from a hacker point of
view and obtain all the files included in the dump file. For each step of the
process, the applied method is explained in detail together with example.
The NAND dump that is going to focus in depth is physical NAND dump, which is
the dump file getting from a universal programmer. For the dump file getting
from bootloader such as u-boot, I name it as logical NAND dump. For logical
NAND dump, the correctness of data is ensured by the Flash Translation Layer
(FTL). In other words, the FTL will do all the bit errors fixing with Error
Correcting Code (ECC) for you. However, for physical NAND dump, the data will
come along with ECC, and you are on your own to guess how to use the ECC to
ensure the correctness of data. If bit errors exist, the ECC should be used
to fix the errors accordingly. But, it is not easy to guess how the ECC works
associated with the data. If the association between ECC and data is not
known, it is impossible to use the ECC to fix bit errors in data. So, it is
necessary to perform thorough NAND dump analysis systematically and uncover
the association between ECC and data which is in secret. It is not a good
idea to uncover the secret by brute forcing it blindly. Instead, by making
use the result from thorough analysis, the blindly brute forcing can be
transformed into guided brute forcing. As a result, the chance of getting
the secret association between ECC and data is maximized in the guided
brute forcing manner.

Once the bit errors in data get fixed, and the ECCs get removed, the NAND
dump transformed from physical into logical, and it is ready for actual
firmware image analysis. As a real case scenario for this paper, an UBI
image is going to deal with. The analysis to the UBI image will be
discussed in pretty detail. Based on the substantial knowledge gained from
the UBI image analysis, a creative approach is proposed to recover the file
system and extract all the files being hosted inside the file system. It
is important to note that the entire process being discussed in this paper
is not possible to replicate with those automated tools such as binwalk or
unblob. Besides, the entire analysis process is getting demonstrated on
step-by-step basis manually to make sure everything is explained clearly.
Without wasting more time in mere talk, let's get started from the actual
NAND dump analysis in details.   


2 - NAND Dump Analysis

First of all, let's start with a little bit of fundamental stuff. A NAND
flash comprises a lot of so called "page" in certain size, and a group of
"page" in certain count will make up a "block". Since the sample NAND dump
that is going to be used for the demonstration is obtained from an actual
NAND chip with part number of MT29F2G08ABAEAWP, and so it should be used
as example to illustrate the hacking-related technical specification
accordingly. So, for MT29F2G08ABAEAWP, the size of a "page" is 2048+64=2112
bytes, and a group of 64 "page" make up a "block", and 2048 "block" make up
the entire storage of the NAND flash, which contain 2048*64=131072 "page".
For each "page" with 2112 bytes in size, the first 2048 bytes are data and
the rest of 64 bytes are spare area to host ECC or some kind of vendor
specific metadata. Sometimes, the spare area is also known as Out Of Band
(OOB) in some literatures. As a overview of the sample NAND dump in hex
mode for the first "page", 0x0000 to 0x07ff is data portion, and 0x0800 to
0x083f is spare area or OOB portion, as shown below.

cawan% hexdump -C -n 2112 ./MT29F2G08ABAEAWP@TSOP48.BIN
00000000  20 54 56 4e 00 02 00 00  a0 ac 00 00 ff ff ff ff  | TVN............|
00000010  55 aa 55 aa 2e 00 00 00  20 02 00 b0 00 00 00 01  |U.U..... .......|
00000020  64 02 00 b0 18 00 00 c0  20 02 00 b0 18 00 00 01  |d....... .......|
00000030  aa 55 aa 55 01 00 00 00  aa 55 aa 55 01 00 00 00  |.U.U.....U.U....|
00000040  28 18 00 b0 4a d8 dc 53  08 18 00 b0 14 80 00 00  |(...J..S........|
00000050  aa 55 aa 55 01 00 00 00  aa 55 aa 55 01 00 00 00  |.U.U.....U.U....|
00000060  aa 55 aa 55 01 00 00 00  00 18 00 b0 76 04 03 00  |.U.U........v...|
00000070  aa 55 aa 55 01 00 00 00  04 18 00 b0 21 00 00 00  |.U.U........!...|
00000080  aa 55 aa 55 01 00 00 00  04 18 00 b0 23 00 00 00  |.U.U........#...|
00000090  aa 55 aa 55 01 00 00 00  aa 55 aa 55 01 00 00 00  |.U.U.....U.U....|
000000a0  aa 55 aa 55 01 00 00 00  04 18 00 b0 27 00 00 00  |.U.U........'...|
000000b0  aa 55 aa 55 01 00 00 00  aa 55 aa 55 01 00 00 00  |.U.U.....U.U....|
000000c0  aa 55 aa 55 01 00 00 00  20 18 00 b0 00 00 00 00  |.U.U.... .......|
000000d0  24 18 00 b0 00 00 00 00  1c 18 00 b0 00 40 00 00  |$............@..|
000000e0  18 18 00 b0 32 03 00 00  10 18 00 b0 06 00 00 00  |....2...........|
000000f0  04 18 00 b0 27 00 00 00  aa 55 aa 55 01 00 00 00  |....'....U.U....|
00000100  aa 55 aa 55 01 00 00 00  aa 55 aa 55 01 00 00 00  |.U.U.....U.U....|
00000110  04 18 00 b0 2b 00 00 00  04 18 00 b0 2b 00 00 00  |....+.......+...|
00000120  04 18 00 b0 2b 00 00 00  18 18 00 b0 32 02 00 00  |....+.......2...|
00000130  1c 18 00 b0 81 47 00 00  1c 18 00 b0 01 44 00 00  |.....G.......D..|
00000140  04 18 00 b0 20 00 00 00  34 18 00 b0 20 88 88 00  |.... ...4... ...|
00000150  aa 55 aa 55 01 00 00 00  18 02 00 b0 08 00 00 00  |.U.U............|
00000160  60 31 00 b8 00 80 00 00  a0 31 00 b8 00 80 00 00  |`1.......1......|
00000170  2c 02 00 b0 00 01 00 00  2c 02 00 b0 00 01 00 00  |,.......,.......|
00000180  2c 02 00 b0 00 01 00 00  00 00 00 00 00 00 00 00  |,...............|
00000190  13 00 00 ea 14 f0 9f e5  10 f0 9f e5 0c f0 9f e5  |................|
000001a0  08 f0 9f e5 04 f0 9f e5  00 f0 9f e5 04 f0 1f e5  |................|
000001b0  20 03 00 00 78 56 34 12  78 56 34 12 78 56 34 12  | ...xV4.xV4.xV4.|
000001c0  78 56 34 12 78 56 34 12  78 56 34 12 78 56 34 12  |xV4.xV4.xV4.xV4.|
000001d0  00 02 00 00 a0 ac 00 00  80 b5 00 00 a0 ac 00 00  |................|
000001e0  de c0 ad 0b 00 00 0f e1  1f 00 c0 e3 d3 00 80 e3  |................|
000001f0  00 f0 29 e1 bc d0 9f e5  07 d0 cd e3 00 00 a0 e3  |..).............|
00000200  70 05 00 eb 00 40 a0 e1  01 50 a0 e1 02 60 a0 e1  |p....@...P...`..|
00000210  04 d0 a0 e1 8c 00 4f e2  00 90 46 e0 06 00 50 e1  |......O...F...P.|
00000220  06 00 00 0a 06 10 a0 e1  5c 30 1f e5 03 20 80 e0  |........\0... ..|
00000230  00 06 b0 e8 00 06 a1 e8  02 00 50 e1 fb ff ff 3a  |..........P....:|
00000240  74 00 9f e5 74 10 9f e5  00 20 a0 e3 01 00 50 e1  |t...t.... ....P.|
00000250  02 00 00 2a 00 20 80 e5  04 00 80 e2 fa ff ff ea  |...*. ..........|
00000260  00 00 9f e5 00 f0 a0 e1  54 06 00 00 a0 ac 00 00  |........T.......|
00000270  a0 ac 00 00 a0 ac 00 00  00 00 a0 e3 17 0f 07 ee  |................|
00000280  17 0f 08 ee 10 0f 11 ee  23 0c c0 e3 87 00 c0 e3  |........#.......|
00000290  02 00 80 e3 01 0a 80 e3  10 0f 01 ee 0e c0 a0 e1  |................|
000002a0  0a 00 00 eb 0c e0 a0 e1  0e f0 a0 e1 00 00 a0 e1  |................|
000002b0  e8 d0 1f e5 fe ff ff eb  00 80 00 bc a0 ae 00 00  |................|
000002c0  80 b7 00 00 00 00 a0 e1  00 00 a0 e1 00 00 a0 e1  |................|
000002d0  68 00 9f e5 00 10 e0 e3  00 10 80 e5 00 00 0f e1  |h...............|
000002e0  c0 00 80 e3 00 f0 21 e1  54 00 9f e5 54 10 9f e5  |......!.T...T...|
000002f0  00 10 80 e5 50 00 9f e5  50 10 9f e5 00 10 80 e5  |....P...P.......|
00000300  4c 00 9f e5 05 14 a0 e3  00 10 80 e5 44 00 9f e5  |L...........D...|
00000310  44 10 9f e5 00 10 80 e5  03 2a a0 e3 01 20 52 e2  |D........*... R.|
00000320  fd ff ff 1a 20 00 9f e5  30 10 9f e5 00 10 80 e5  |.... ...0.......|
00000330  01 2b a0 e3 01 20 52 e2  fd ff ff 1a 0e f0 a0 e1  |.+... R.........|
00000340  24 21 00 b8 04 10 00 b0  84 00 04 40 04 02 00 b0  |$!.........@....|
00000350  ff 0f 00 00 08 02 00 b0  0c 02 00 b0 24 4f 00 00  |............$O..|
00000360  fc 0f 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000370  00 00 51 e3 1f 00 00 0a  01 30 a0 e3 00 20 a0 e3  |..Q......0... ..|
00000380  01 00 50 e1 19 00 00 3a  01 02 51 e3 00 00 51 31  |..P....:..Q...Q1|
00000390  01 12 a0 31 03 32 a0 31  fa ff ff 3a 02 01 51 e3  |...1.2.1...:..Q.|
000003a0  00 00 51 31 81 10 a0 31  83 30 a0 31 fa ff ff 3a  |..Q1...1.0.1...:|
000003b0  01 00 50 e1 01 00 40 20  03 20 82 21 a1 00 50 e1  |..P...@ . .!..P.|
000003c0  a1 00 40 20 a3 20 82 21  21 01 50 e1 21 01 40 20  |..@ . .!!.P.!.@ |
000003d0  23 21 82 21 a1 01 50 e1  a1 01 40 20 a3 21 82 21  |#!.!..P...@ .!.!|
000003e0  00 00 50 e3 23 32 b0 11  21 12 a0 11 ef ff ff 1a  |..P.#2..!.......|
000003f0  02 00 a0 e1 0e f0 a0 e1  04 e0 2d e5 c9 1c 00 eb  |..........-.....|
00000400  00 00 a0 e3 00 80 bd e8  03 50 2d e9 d7 ff ff eb  |.........P-.....|
00000410  06 50 bd e8 90 02 03 e0  03 10 41 e0 0e f0 a0 e1  |.P........A.....|
00000420  03 50 2d e9 09 00 00 eb  06 50 bd e8 90 02 03 e0  |.P-......P......|
00000430  03 10 41 e0 0e f0 a0 e1  00 00 a0 e1 00 00 a0 e1  |..A.............|
00000440  00 00 a0 e1 00 00 a0 e1  00 00 a0 e1 00 00 a0 e1  |................|
00000450  00 00 51 e3 01 c0 20 e0  42 00 00 0a 00 10 61 42  |..Q... .B.....aB|
00000460  01 20 51 e2 27 00 00 0a  00 30 b0 e1 00 30 60 42  |. Q.'....0...0`B|
00000470  01 00 53 e1 26 00 00 9a  02 00 11 e1 28 00 00 0a  |..S.&.......(...|
00000480  0e 02 11 e3 81 11 a0 01  08 20 a0 03 01 20 a0 13  |......... ... ..|
00000490  01 02 51 e3 03 00 51 31  01 12 a0 31 02 22 a0 31  |..Q...Q1...1.".1|
000004a0  fa ff ff 3a 02 01 51 e3  03 00 51 31 81 10 a0 31  |...:..Q...Q1...1|
000004b0  82 20 a0 31 fa ff ff 3a  00 00 a0 e3 01 00 53 e1  |. .1...:......S.|
000004c0  01 30 43 20 02 00 80 21  a1 00 53 e1 a1 30 43 20  |.0C ...!..S..0C |
000004d0  a2 00 80 21 21 01 53 e1  21 31 43 20 22 01 80 21  |...!!.S.!1C "..!|
000004e0  a1 01 53 e1 a1 31 43 20  a2 01 80 21 00 00 53 e3  |..S..1C ...!..S.|
000004f0  22 22 b0 11 21 12 a0 11  ef ff ff 1a 00 00 5c e3  |""..!.........\.|
00000500  00 00 60 42 0e f0 a0 e1  00 00 3c e1 00 00 60 42  |..`B......<...`B|
00000510  0e f0 a0 e1 00 00 a0 33  cc 0f a0 01 01 00 80 03  |.......3........|
00000520  0e f0 a0 e1 01 08 51 e3  21 18 a0 21 10 20 a0 23  |......Q.!..!. .#|
00000530  00 20 a0 33 01 0c 51 e3  21 14 a0 21 08 20 82 22  |. .3..Q.!..!. ."|
00000540  10 00 51 e3 21 12 a0 21  04 20 82 22 04 00 51 e3  |..Q.!..!. ."..Q.|
00000550  03 20 82 82 a1 20 82 90  00 00 5c e3 33 02 a0 e1  |. ... ....\.3...|
00000560  00 00 60 42 0e f0 a0 e1  04 e0 2d e5 6d 1c 00 eb  |..`B......-.m...|
00000570  00 00 a0 e3 04 f0 9d e4  00 00 a0 e1 00 00 a0 e1  |................|
00000580  00 00 a0 e1 00 00 a0 e1  00 00 a0 e1 00 00 a0 e1  |................|
00000590  20 30 52 e2 20 c0 62 e2  30 02 a0 41 31 03 a0 51  | 0R. .b.0..A1..Q|
000005a0  11 0c 80 41 31 12 a0 e1  0e f0 a0 e1 20 30 52 e2  |...A1....... 0R.|
000005b0  20 c0 62 e2 11 12 a0 41  10 13 a0 51 30 1c 81 41  | .b....A...Q0..A|
000005c0  10 02 a0 e1 0e f0 a0 e1  20 30 52 e2 20 c0 62 e2  |........ 0R. .b.|
000005d0  30 02 a0 41 51 03 a0 51  11 0c 80 41 51 12 a0 e1  |0..AQ..Q...AQ...|
000005e0  0e f0 a0 e1 2d de 4d e2  00 40 a0 e3 6c 31 9f e5  |....-.M..@..l1..|
000005f0  0d 00 a0 e1 00 30 8d e5  04 30 8d e5 1c 40 8d e5  |.....0...0...@..|
00000600  bc d2 8d e5 30 40 8d e5  50 40 8d e5 d1 01 00 eb  |....0@..P@......|
00000610  1c 30 9d e5 04 00 53 e1  02 00 00 0a 04 10 a0 e1  |.0....S.........|
00000620  8a 0f 8d e2 33 ff 2f e1  8a 0f 8d e2 01 10 a0 e3  |....3./.........|
00000630  ca 1a 00 eb 00 00 50 e3  46 00 00 1a 70 04 00 eb  |......P.F...p...|
00000640  38 42 9d e5 3c 52 9d e5  04 00 a0 e1 05 10 a0 e1  |8B..<R..........|
00000650  46 ff ff eb 04 10 a0 e1  0e a6 a0 e3 00 b0 a0 e1  |F...............|
00000660  0a 08 a0 e3 41 ff ff eb  04 10 a0 e1 00 70 a0 e1  |....A........p..|
00000670  ec 00 9f e5 3d ff ff eb  04 10 a0 e1 00 90 a0 e1  |....=...........|
00000680  0a 08 a0 e3 5f ff ff eb  01 00 a0 e1 05 10 a0 e1  |...._...........|
00000690  36 ff ff eb 00 60 a0 e1  24 00 00 ea 3c 12 9d e5  |6....`..$...<...|
000006a0  38 02 9d e5 31 ff ff eb  8a 4f 8d e2 bc 52 9d e5  |8...1....O...R..|
000006b0  50 10 a0 e3 00 20 a0 e3  90 07 03 e0 04 00 a0 e1  |P.... ..........|
000006c0  0f e0 a0 e1 34 f0 95 e5  04 00 a0 e1 0f e0 a0 e1  |....4...........|
000006d0  08 f0 95 e5 ff 00 50 e3  01 90 89 12 12 00 00 1a  |......P.........|
000006e0  0e 00 00 ea bc 42 9d e5  38 02 9d e5 d4 51 94 e5  |.....B..8....Q..|
000006f0  3c 12 9d e5 00 00 55 e3  05 00 00 0a 1b ff ff eb  |<.....U.........|
00000700  04 10 a0 e1 0a 20 a0 e1  90 67 23 e0 8a 0f 8d e2  |..... ...g#.....|
00000710  35 ff 2f e1 3c 32 9d e5  01 60 86 e2 03 a0 8a e0  |5./.<2...`......|
00000720  0b 00 56 e1 ee ff ff 3a  00 60 a0 e3 01 70 87 e2  |..V....:.`...p..|
00000730  09 00 57 e1 d8 ff ff 9a  1c 30 9d e5 00 00 53 e3  |..W......0....S.|
00000740  02 00 00 0a 8a 0f 8d e2  00 10 e0 e3 33 ff 2f e1  |............3./.|
00000750  0e 36 a0 e3 33 ff 2f e1  2d de 8d e2 1e ff 2f e1  |.6..3./.-...../.|
00000760  00 d0 00 b0 ff cf 11 00  f0 40 2d e9 02 60 d3 e5  |.........@-..`..|
00000770  00 40 d3 e5 00 00 d2 e5  01 c0 d3 e5 02 50 d2 e5  |.@...........P..|
00000780  01 30 d2 e5 00 40 24 e0  03 c0 2c e0 05 60 26 e0  |.0...@$...,..`&.|
00000790  ff 00 04 e2 06 30 8c e1  03 30 90 e1 01 70 a0 e1  |.....0...0...p..|
000007a0  03 00 a0 01 f0 80 bd 08  ac 50 a0 e1 0c 30 25 e0  |.........P...0%.|
000007b0  55 30 03 e2 55 00 53 e3  28 00 00 1a a0 30 20 e0  |U0..U.S.(....0 .|
000007c0  55 30 03 e2 55 00 53 e3  24 00 00 1a a6 30 26 e0  |U0..U.S.$....0&.|
000007d0  54 30 03 e2 54 00 53 e3  20 00 00 1a 80 20 a0 e1  |T0..T.S. .... ..|
000007e0  00 31 a0 e1 20 30 03 e2  40 20 02 e2 03 20 82 e1  |.1.. 0..@ ... ..|
000007f0  80 10 04 e2 80 31 a0 e1  01 20 82 e1 10 30 03 e2  |.....1... ...0..|
00000800  ff ff 00 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000810  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000820  f6 89 f7 79 e5 60 c9 e0  d6 e3 ed cb 9c b0 f9 f0  |...y.`..........|
00000830  1f da d4 a4 9c d4 1b e0  e0 90 cc 85 d8 d2 e2 80  |................|
00000840

This sample NAND dump is in fact a physical NAND dump from a real industrial
product. As mentioned earlier, this sample will be used as a real case
scenario to illustrate each step of analysis process until the full file
system getting extracted and recovered. Let's start with DumpFlash tool and
try to identify the ID codes of the NAND chip. However, it's failed and the
output is shown below. This happen might due to the ID codes are missing or
changed to something strange in the NAND dump.

cawan% python2.7 dumpflash.py -i ./MT29F2G08ABAEAWP@TSOP48.BIN
PageSize: 0x200
OOBSize: 0x10
PagePerBlock: 0x20
BlockSize: 0x4000
RawPageSize: 0x210
FileSize: 0x10800000
PageCount: 0x84000

So, just forget about the false output generated by DumpFlash, and back to
the technical specification as provided by the datasheet of MT29F2G08ABAEAWP.
Let's have a brief look to the OOB with 64 bytes in size of the first "page"
in particular.

00000800  ff ff 00 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000810  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000820  f6 89 f7 79 e5 60 c9 e0  d6 e3 ed cb 9c b0 f9 f0  |...y.`..........|
00000830  1f da d4 a4 9c d4 1b e0  e0 90 cc 85 d8 d2 e2 80  |................|

From this, two assumptions can be made. One, the first 32 bytes of OOB might
be a constant. Two, the second 32 bytes might be ECCs. Let's verify the
first assumption is a fact or a mistake, by checking the OOB of the second
"page", as shown below.

cawan% hexdump -v -C -n $((2112*2)) ./MT29F2G08ABAEAWP@TSOP48.BIN | tail -n 5
00001040  ff ff 00 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00001050  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00001060  8f ce f4 8b 1c 26 38 00  bd 61 a0 c7 48 c4 d3 60  |.....&8..a..H..`|
00001070  d2 1b 46 ab 53 8f 41 f0  8d 18 2b 3b 8d 54 21 50  |..F.S.A...+;.T!P|

Yes, it seems unchanged. How about the third "page" then ?

cawan% hexdump -v -C -n $((2112*3)) ./MT29F2G08ABAEAWP@TSOP48.BIN | tail -n 5
00001880  ff ff 00 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00001890  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
000018a0  01 8b bb 0a bb 54 88 50  7e 0e b9 9a c2 7b bd 40  |.....T.P~....{.@|
000018b0  dd 63 cb 9a e3 5a bc 70  65 ca 16 7a 50 dc 60 e0  |.c...Z.pe..zP.`.|

Still unchanged. How about the first "page" of the next block then ?

cawan% hexdump -C -v -n $((2112*64+2112)) ./MT29F2G08ABAEAWP@TSOP48.BIN | \
tail -n 5
00021800  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00021810  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00021820  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00021830  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|

Well, this is a blank page that should be ignored. By grabbing a few samples
and make a conclusion is really not a good idea. Let's check it in proper.

############################### check_const.py ###############################


input_file = open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")

suspect_const = \
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

blank = \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

page_count = 0
diff_count = 0

while 1:
      data = input_file.read(2112)
      if len(data) == 0:
            break
      oob_first_32_bytes = data[2048:2048+32]
      page_count += 1
      if len(data) == 2112 and oob_first_32_bytes != blank:
            if oob_first_32_bytes != suspect_const:
                  diff_count += 1
           
print("diff_count: %d  page_count: %d\n" % (diff_count, page_count))


##################################### end ####################################

The output is,

cawan% python3.8 check_const.py
diff_count: 0  page_count: 131072

So, it is convincing enough to say that the first 32 bytes of OOB for all
the "page" are constant. Next, let's verify the second assumption about the
second 32 bytes of OOB are ECCs or not. The ECC suspected portion of OOB
for the first 4 "page" are shown below. 

00000820  f6 89 f7 79 e5 60 c9 e0  d6 e3 ed cb 9c b0 f9 f0  |...y.`..........|
00000830  1f da d4 a4 9c d4 1b e0  e0 90 cc 85 d8 d2 e2 80  |................|

00001060  8f ce f4 8b 1c 26 38 00  bd 61 a0 c7 48 c4 d3 60  |.....&8..a..H..`|
00001070  d2 1b 46 ab 53 8f 41 f0  8d 18 2b 3b 8d 54 21 50  |..F.S.A...+;.T!P|

000018a0  01 8b bb 0a bb 54 88 50  7e 0e b9 9a c2 7b bd 40  |.....T.P~....{.@|
000018b0  dd 63 cb 9a e3 5a bc 70  65 ca 16 7a 50 dc 60 e0  |.c...Z.pe..zP.`.|

000020e0  43 a9 36 70 be b0 5e 90  1c 4f c1 ad 19 54 4d 20  |C.6p..^..O...TM |
000020f0  b8 6a 20 ba 32 c2 74 80  76 73 45 10 64 3e 38 c0  |.j .2.t.vsE.d>8.|

The output looks positive, and it provides extra information about how the
ECC suspected portion of OOB going to be used by the system implementation.
For each "page", it seems the 32 bytes of ECC suspected portion can be
divided into four of 8 bytes each ECCs. The reason is the last 4 bits of
each 8 bytes of suspected ECC are always to be zero, as shown below.

f6 89 f7 79 e5 60 c9 e0  
d6 e3 ed cb 9c b0 f9 f0
1f da d4 a4 9c d4 1b e0
e0 90 cc 85 d8 d2 e2 80

8f ce f4 8b 1c 26 38 00
bd 61 a0 c7 48 c4 d3 60
d2 1b 46 ab 53 8f 41 f0
8d 18 2b 3b 8d 54 21 50

01 8b bb 0a bb 54 88 50
7e 0e b9 9a c2 7b bd 40
dd 63 cb 9a e3 5a bc 70
65 ca 16 7a 50 dc 60 e0

43 a9 36 70 be b0 5e 90
1c 4f c1 ad 19 54 4d 20
b8 6a 20 ba 32 c2 74 80
76 73 45 10 64 3e 38 c0
                      ^
                      0

Since a "page" comprises four ECCs, it is reasonable to deduce the data
portion of a "page" with 2048 bytes in size can be divided into four
512 bytes of "sub-page". For each "sub-page", it is protected by the
respective ECC, in sequence, as shown below.


f6 89 f7 79 e5 60 c9 e0 <- ECC of the 1st "sub-page" in 1st "page"   
d6 e3 ed cb 9c b0 f9 f0 <- ECC of the 2nd "sub-page" in 1st "page"
1f da d4 a4 9c d4 1b e0 <- ECC of the 3rd "sub-page" in 1st "page"
e0 90 cc 85 d8 d2 e2 80 <- ECC of the 4th "sub-page" in 1st "page"

8f ce f4 8b 1c 26 38 00 <- ECC of the 1st "sub-page" in 2nd "page"
bd 61 a0 c7 48 c4 d3 60 <- ECC of the 2st "sub-page" in 2nd "page"
d2 1b 46 ab 53 8f 41 f0 <- ECC of the 3st "sub-page" in 2nd "page"
8d 18 2b 3b 8d 54 21 50 <- ECC of the 4st "sub-page" in 2nd "page"

01 8b bb 0a bb 54 88 50 <- ECC of the 1st "sub-page" in 3rd "page"
7e 0e b9 9a c2 7b bd 40 <- ECC of the 2st "sub-page" in 3rd "page"
dd 63 cb 9a e3 5a bc 70 <- ECC of the 3st "sub-page" in 3rd "page"
65 ca 16 7a 50 dc 60 e0 <- ECC of the 4st "sub-page" in 3rd "page"

43 a9 36 70 be b0 5e 90 <- ECC of the 1st "sub-page" in 4th "page"
1c 4f c1 ad 19 54 4d 20 <- ECC of the 2st "sub-page" in 4th "page"
b8 6a 20 ba 32 c2 74 80 <- ECC of the 3st "sub-page" in 4th "page"
76 73 45 10 64 3e 38 c0 <- ECC of the 4st "sub-page" in 4th "page"
                      ^
                      0

When saying the last 4 bits of each ECC is zero, it might indicate the
length of the ECC is 8*8=64-4=60 bits. As a side note, it is important
to note that the ECC length is normally expressed in bit form. Let's
get confirm to all the ECCs are 60-bits in size by checking the last
4 bits for each of them are always zero.

########################### check_ecc_last_4bit.py ###########################


input_file = open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")

suspect_const = \
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

blank = \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

masking = b'\x00\x00\x00\x00\x00\x00\x00\x0f'

page_count = 0
diff_count = 0

while 1:
      data = input_file.read(2112)
      if len(data) == 0:
            break
      oob_1st_32_bytes = data[2048:2048+32]
      oob_2nd_32_bytes = data[2048+32:2048+64]
      page_count += 1
      if len(data) == 2112 and oob_1st_32_bytes != blank:
            for i in range(4):
                  last_4_bits = bytes([a & b for a, b in \
zip(oob_2nd_32_bytes[i*8:i*8+8], masking)])
                  if last_4_bits[7] != 0:
                        diff_count += 1
           
print("diff_count: %d  page_count: %d\n" % (diff_count, page_count))


##################################### end ####################################

The output is,

cawan% python3.8 check_ecc_last_4bit.py
diff_count: 0  page_count: 131072

With such a convincing result, it is reasonable to say that the ECC length
is 60 bits.

Now, let's get a brief hacker overview of ECC algorithm. In general,
three types of implementation are normally being used: Hamming,
Reed-Solomon (RS), and Binary BCH. However, due to the Hamming code can
only correct a single bit of error, and the RS code require more code
redundancy for a given error correction, Binary BCH code is the mostly
used modern ECC implementation. Thus, the Binary BCH is assumed to be the
ECC implementation here. In addition, some special characteristics of
Binary BCH can help to further identifying the ECC implementation. The
first characteristic is for those data with all zero regardless its size,
the respective ECC in Binary BCH should also be all zero. Let's show it
in example by using bchlib. Let's be clear that all the parameters are
just for demo at this stage, the actual parameters will be derived from
the analysis part by part. Let's go ahead to the first characteristic.

############################## test_bchlib_01.py #############################


import bchlib

BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

data = bytearray(b'\x00'*512)
ecc = bch.encode(data)

for i in ecc:
      print("%X" % i, end='')
print("")


##################################### end ####################################

The bchlib is used for Binary BCH encoding and decoding tasks. Two
parameters have to be specified to make it works, BCH_POLYNOMIAL and
BCH_BITS. The BCH_POLYNOMIAL is about the primitive polynomial going to
be used, and the BCH_BITS is about the maximum number of bit errors in
data that can be corrected by the ECC. All the details about these two
parameters will be discussed in the coming section of Binary BCH
implementation as it is crucial to uncover the secret association between
ECC and data. Now, let's get the first glance of bchlib and study the
first characteristic of Binary BCH. The output of test_bchlib_01.py is
shown below.

cawan% python3.8 test_bchlib_01.py
0000000

The BCH encoded output of 512 bytes of zero is indeed 3.5 bytes of zero.
How about 512 bytes of 0xFF then ? Let's check.

############################## test_bchlib_02.py #############################


import bchlib

BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

data = bytearray(b'\xFF'*512)
ecc = bch.encode(data)

for i in ecc:
      print("%X" % i, end='')
print("")


##################################### end ####################################

The output is,

cawan% python3.8 test_bchlib_02.py
D7EC33C6695380

The output is not all 0xFF and it makes sense. Otherwise, if 512 bytes
of 0xFF getting BCH encoded as 7 bytes of 0xFF, then it is not convenient
to differentiate from a blank "page". Now, let's proceed to the second
characteristic about the zeros padding issues. The question now is what
happen if 32 bytes of zeros appended to the 512 bytes of 0xFF ? Let's
check it.

############################## test_bchlib_03.py #############################


import bchlib

BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

data = bytearray(b'\xFF'*512 + b'\x00'*32)
ecc = bch.encode(data)

for i in ecc:
      print("%X" % i, end='')
print("")


##################################### end ####################################

The output is,

cawan% python3.8 test_bchlib_03.py
BCE3B0AE479EB0

Well, it seems the zeros padded data is having different BCH encoded
output than the non-zeros padded data does, provided the data is not
all zeros. However, this is not the case of an inherent BCH encoder.
An inherent BCH encoder will generate exactly the same output for both
zeros padded data and non-zeros padded data. while such a characteristic
will cause some kind of discrepancy, such an issue should be avoided.
A common approach in overcoming such an issue caused by its inherent
characteristic is by reversing the bit order of the entire data, right
before getting it BCH encoded. So, it is reasonable to assume bchlib
should follow such an approach, but how to verify it ? Well, while
making such an assumption, for the data with 512 bytes of 0xFF appended
by 32 bytes of zeros, it means the actual data being BCH encoded by
bchlib is in fact 32 bytes of zeros being prepended at the 512 bytes of
0xFF. So, if this is the case, the BCH encoded output of the zeros
prepended data should be the same with the non-zeros prepended data.
Let's verify it.

############################## test_bchlib_04.py #############################


import bchlib

BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

data1 = bytearray(b'\x00'*32 + b'\xFF'*512)
ecc1 = bch.encode(data1)

data2 = bytearray(b'\xFF'*512)
ecc2 = bch.encode(data2)

print("Zeros Prepended:")
for i in ecc1:
      print("%X" % i, end='')
print("")

print("Nothing Prepended:")
for i in ecc2:
      print("%X" % i, end='')
print("")


##################################### end ####################################

As expected, both of the BCH encoded output are exactly the same, and the
output is shown below,

cawan% python3.8 test_bchlib_04.py
Zeros Prepended:
D7EC33C6695380
Nothing Prepended:
D7EC33C6695380

One important point should take note here. If the input data is bit order
reversed, the BCH encoded output should be in bit order reversed form also.
Thanks to bchlib for implementing this in default mode. Now, another
question arises, is it possible to remain the bit order of the input data
which is going to be BCH encoded ? Yes, it is possible by performing bit
order reversing to the input data first before passing to the bchlib
encoder, and of course the BCH encoded output should perform bit order
reversing accordingly. Let's show it by example.

############################## test_bchlib_05.py #############################


import bchlib

BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

data = bytearray(b'\xFF'*511 + b'\xAA')

data_reverse_bit = b''

for i in range(0, len(data)):
      data_reverse_bit += bytes([int("{:08b}".format(data[i])[::-1],2)])

data_reverse_bit = data_reverse_bit[::-1]

ecc = bch.encode(data_reverse_bit)

ecc_reverse_bit = b''

for i in range(0, len(ecc)):
      ecc_reverse_bit += bytes([int("{:08b}".format(ecc[i])[::-1],2)])

ecc_reverse_bit = ecc_reverse_bit[::-1]

for i in ecc_reverse_bit:
      print("%X" % i, end='')
print("")


##################################### end ####################################

In this test_bchlib_05.py, the last bytes of the entire 512 bytes of data
input is purposely changed from 0xFF to 0xAA to avoid symmetricity of the
data ( 0b11111111 after bit order reversing is still 0b11111111 ). Now,
let's see the output.

cawan% python3.8 test_bchlib_05.py
72FFA2590ECDB

So, if everything correct, if 32 bytes of zeros appended to this 512 bytes
of data input and get BCH encoded, the output should be equal to
72FFA2590ECDB also. Let's verify it.

############################## test_bchlib_06.py #############################


import bchlib

BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

data = bytearray(b'\xFF'*511 + b'\xAA' + b'\x00'*32)

data_reverse_bit = b''

for i in range(0, len(data)):
      data_reverse_bit += bytes([int("{:08b}".format(data[i])[::-1],2)])

data_reverse_bit = data_reverse_bit[::-1]

ecc = bch.encode(data_reverse_bit)

ecc_reverse_bit = b''

for i in range(0, len(ecc)):
      ecc_reverse_bit += bytes([int("{:08b}".format(ecc[i])[::-1],2)])

ecc_reverse_bit = ecc_reverse_bit[::-1]

for i in ecc_reverse_bit:
      print("%X" % i, end='')
print("")


##################################### end ####################################

Perfect, the output is exactly as expected as shown below.

cawan% python3.8 test_bchlib_06.py
72FFA2590ECDB

That's enough for the "first glance" of bchlib by studying some
characteristics of Binary BCH. To summarize the lesson learned from the
"first glance" in a hacker perspective, one should clear with two points.
First, a data input with all zeros will generate all zeros output.
Second, a data input padded with whatever size of zeros will generate the
same output as no zeros being appended to the data input. Get back to the
NAND dump, the two points inspire a mind click. If the 60-bits BCH encoded
ECC exists somewhere in the form of all zeros, the 512 bytes of the data
in the respective "sub-page" should be in all zeros form too. If yes, it
means the data being BCH encoded is either no padding added or all zeros
padding added. If not, it means the padding being added is not all zeros.
Sound confused ? Let's grab a "sub-page" in the NAND dump where the
respective BCH encoded ECC is in all zeros form. It should be clear to
explain it by example.

########################### check_all_zeros_ecc.py ###########################


input_file = open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")

oob_const = b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
                                    b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

zeros_ecc = b'\x00\x00\x00\x00\x00\x00\x00\x00'

page_cnt = 0
positive_cnt = 0

while 1:
      data = input_file.read(2112)
      if len(data) == 0:
            break
      oob_1st_32_bytes = data[2048:2048+32]
      oob_2nd_32_bytes = data[2048+32:2048+32+32]    
      if len(data) == 2112 and oob_1st_32_bytes == oob_const:
            for i in range(0, 4):
                  ecc = oob_2nd_32_bytes[i*8:i*8+8]
                  if ecc == zeros_ecc:
                        positive_cnt += 1
                        print("Page Num: %d, Address: 0x%X" % (page_cnt, page_cnt*2112))
                        break
            if positive_cnt == 1:
                  break
      page_cnt += 1
           
print("Completed")


##################################### end ####################################

Let's see any "page" can meet the condition, if yes, show the "page"
number and its address of the first found item. The output is shown below.

cawan% python3.8 check_all_zeros_ecc.py                                                                     
Page Num: 256, Address: 0x84000
Completed

Nice, the first found item is at address 0x84000. Let's display the full
"page" in hex view.

cawan% hexdump -C -v -n $((0x84000+2112)) MT29F2G08ABAEAWP@TSOP48.BIN \
| tail -n $((0x840/16+1))
00084000  76 3d f5 33 62 61 75 64  72 61 74 65 3d 31 31 35  |v=.3baudrate=115|
00084010  32 30 30 00 62 6f 6f 74  61 72 67 73 3d 6d 65 6d  |200.bootargs=mem|
00084020  3d 36 34 4d 20 63 6f 6e  73 6f 6c 65 3d 74 74 79  |=64M console=tty|
00084030  53 30 2c 31 31 35 32 30  30 20 75 62 69 2e 6d 74  |S0,115200 ubi.mt|
00084040  64 3d 32 20 72 6f 6f 74  3d 75 62 69 30 3a 75 62  |d=2 root=ubi0:ub|
00084050  69 66 73 20 72 77 20 72  6f 6f 74 66 73 74 79 70  |ifs rw rootfstyp|
00084060  65 3d 75 62 69 66 73 20  69 6e 69 74 3d 2f 6c 69  |e=ubifs init=/li|
00084070  6e 75 78 72 63 00 62 6f  6f 74 63 6d 64 3d 6e 62  |nuxrc.bootcmd=nb|
00084080  6f 6f 74 2e 65 20 30 78  37 46 43 30 20 30 20 30  |oot.e 0x7FC0 0 0|
00084090  78 32 30 30 30 30 30 3b  20 62 6f 6f 74 6d 20 30  |x200000; bootm 0|
000840a0  78 37 46 43 30 00 62 6f  6f 74 64 65 6c 61 79 3d  |x7FC0.bootdelay=|
000840b0  31 00 65 74 68 61 63 74  3d 65 6d 61 63 00 65 74  |1.ethact=emac.et|
000840c0  68 61 64 64 72 3d 30 30  3a 30 30 3a 30 30 3a 31  |haddr=00:00:00:1|
000840d0  31 3a 36 36 3a 38 38 00  69 70 61 64 64 72 3d 31  |1:66:88.ipaddr=1|
000840e0  39 32 2e 31 36 38 2e 38  2e 32 30 33 00 6d 74 64  |92.168.8.203.mtd|
000840f0  70 61 72 74 73 3d 6d 74  64 70 61 72 74 73 3d 6e  |parts=mtdparts=n|
00084100  61 6e 64 30 3a 32 6d 28  75 2d 62 6f 6f 74 29 2c  |and0:2m(u-boot),|
00084110  34 6d 28 6b 65 72 6e 65  6c 29 2c 31 36 6d 28 75  |4m(kernel),16m(u|
00084120  62 69 66 73 29 2c 33 32  6d 28 61 70 70 6c 69 63  |bifs),32m(applic|
00084130  61 74 69 6f 6e 29 2c 33  32 6d 28 62 61 63 6b 75  |ation),32m(backu|
00084140  70 29 2c 2d 28 64 61 74  61 29 00 6e 65 74 6d 61  |p),-(data).netma|
00084150  73 6b 3d 32 35 35 2e 32  35 35 2e 30 2e 30 00 72  |sk=255.255.0.0.r|
00084160  6f 6f 74 76 65 72 3d 4c  59 30 43 2d 30 36 30 31  |ootver=LY0C-0601|
00084170  2d 52 54 30 30 2d 48 30  53 30 2d 32 31 30 31 32  |-RT00-H0S0-21012|
00084180  37 2d 30 30 00 73 65 72  76 65 72 69 70 3d 31 39  |7-00.serverip=19|
00084190  32 2e 31 36 38 2e 38 2e  34 00 73 74 64 65 72 72  |2.168.8.4.stderr|
000841a0  3d 73 65 72 69 61 6c 00  73 74 64 69 6e 3d 73 65  |=serial.stdin=se|
000841b0  72 69 61 6c 00 73 74 64  6f 75 74 3d 73 65 72 69  |rial.stdout=seri|
000841c0  61 6c 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |al..............|
000841d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000841e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000841f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084210  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084220  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084230  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084240  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084250  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084260  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084270  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084280  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084290  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000842a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000842b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000842c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000842d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000842e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000842f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084300  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084310  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084320  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084330  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084340  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084350  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084360  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084370  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084380  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084390  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000843a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000843b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000843c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000843d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000843e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000843f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084400  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084410  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084420  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084430  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084440  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084450  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084460  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084470  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084480  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084490  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000844a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000844b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000844c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000844d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000844e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000844f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084500  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084510  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084520  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084530  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084540  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084550  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084560  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084570  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084580  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084590  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000845a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000845b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000845c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000845d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000845e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000845f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084600  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084610  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084620  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084630  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084640  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084650  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084660  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084670  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084680  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084690  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000846a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000846b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000846c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000846d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000846e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000846f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084700  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084710  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084720  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084730  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084740  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084750  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084760  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084770  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084780  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084790  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000847a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000847b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000847c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000847d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000847e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000847f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084800  ff ff 00 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00084810  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00084820  3b 8d c6 e5 19 b2 24 50  00 00 00 00 00 00 00 00  |;.....$P........|
00084830  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00084840

So, what can be deduced from this "page" ? Well, it is almost certain
the data portion of a "page" in 2048 bytes in size is divided into four
parts with 512 bytes each, which I named it as "sub-page" at the start
of this article. In this "page", the first "sub-page" is started from
0x84000 to 0x841ff, which contains non-zeros data, with BCH encoded ECC
as 3b8dc6e519b22450. The following three "sub-page" are containing all
zeros data, with BCH encoded ECC as all zeros, respectively. In other
words, the 512 bytes of zeros in each of these three "sub-page" are
either being BCH encoded directly, or being padded with a certain number
of zeros ONLY, in order to generate all zeros ECC. Hence, once the others
BCH encoding parameters are slowly unveiled in the discussion of the
following section, it becomes straightforward in recovering the secret
association between ECC and data. So, the second, third, and fourth
"sub-page" in a "page" are clear now, and it is usually about the same
for all the other "page". However, the padding scheme of the first
"sub-page" is still uncertain yet, unless a "page" with four all zeros
ECCs can be found. Let's try it.

####################### check_all_zeros_in_all_ecc.py ########################


input_file = open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")

oob_const = \
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

zeros_ecc = \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

page_cnt = 0

while 1:
      data = input_file.read(2112)
      if len(data) == 0:
            break
      oob_1st_32_bytes = data[2048:2048+32]
      oob_2nd_32_bytes = data[2048+32:2048+32+32]    
      if len(data) == 2112 and oob_1st_32_bytes == oob_const:
            if oob_2nd_32_bytes[0:32] == zeros_ecc[0:32]:
                  print("Page Num: %d, Address: 0x%X" % (page_cnt, page_cnt*2112))
                  break
      page_cnt += 1
           
print("Completed")
    

##################################### end ####################################

Let's find for any expected "page". However, the output is unexpected,
as shown below.

cawan% python3.8 check_all_zeros_in_all_ecc.py
Completed

Anyhow, just let go the unsolved part for now, we will get back later in
the next section. Now, let's have a brief hacker overview of Binary BCH
implementation, yes, solely from a hacker's perspective, not academic.
In general, the BCH codec needs a primitive polynomial in order to derive
a generator polynomial to be used for code generation. The Gallois Field
order will determine the number of primitive polynomial that can be used
by the BCH codec. A polynomial can be represented by an integer or in bit
form binary. The set bits of the integer or the bit form binary represents
the coefficients of the given order of magnitude of the selected primitive
polynomial. Sound confused ? Let's have an example.

                0x201B
                   |
                   V
          0b0010000000011011
                   |
                   V
0b 0  0  1  0  0  0 0 0 0 0 0 1 1 0 1 1
   ^  ^  ^  ^  ^  ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
   |  |  |  |  |  | | | | | | | | | | |
  15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0  

For the hex representation of 0x201B, it can be represented in bit form
binary as 0b0010000000011011. Each of the set bit in this number will
reflect the coefficient of the given order of magnitude to form a primitive
polynomial. For the case of 0x201B, bit-0, bit-1, bit-3, bit-4, and bit-13
are the set bits. So, the primitive polynomial is

x^13 + x^4 + x^3 + x^1 + 1

Yes, each set bit position reflects the selected order of magnitude, and
the greatest set bit position is defined as the degree of the primitive
polynomial. Again, for the case of 0x201B, it is in degree 13. For most
of the times, the degree number is known as m to represent the Gallois
Field order, and so for the case of 0x201B, it can be expressed as m=13.
In order to protect a data in a certain number of size in the unit of bit,
the number should be less than 2^m. For example, to protect a data with
the size of 512 bytes, the data length in the unit of bit is 512*8=4096.
This number is normally known as k, and so, it is more appropriate to
write in the form of k=4096. So, number of 2^m should be greater than 4096,
then m should be greater than log(4096)/log(2)=12, and the m should be at
least 13. Again, for the case of 0x201B, since its m is 13, then it is
suitable to be used in protecting a data with 512 bytes in size. What is
the hex number of 0x201B in decimal ? It is 8219, sound familiar ? Yes,
it was being used in the "first glance" bchlib section in defining the
variable BCH_POLYNOMIAL.

When talking about data protection, one must talk about the protection
strength. The protection strength is about if something went wrong in
data, then the data can tolerate up to how many bit of errors in order
to recover it back to the correct state. The strength is normally known
as t. So, when someone mentions t=4, it means the ECC can tolerate up to
4 bits of error. Alright, it is clear for m, k, and t now. Let's proceed
to the discussion about the length of ECC, which is more commonly named
as the size of parity bits. For BCH, the size of parity bits is equal to
m*t. Thus, by given m=13, k=4096, and t=4, since 2^m=2^13=8192 which is
greater than k=4096, it is appropriate and no discrepancy at all to generate
BCH encoded ECC of parity bits with the size of m*t=13*4=52 bits. Remember
the ECC size being found from the NAND dump analysis in the previous part ?
Yes, it is 60-bits (8 bytes deduct the last 4 bits of zeros). Well, the
boring stuff is getting interesting now. Let's see what can be deduced with
this little clue. The data size to be protected is 512 bytes, which is
4096 bits. The m should be at least 13 and so 2^m=2^13=8192, which is
sufficient to protect the 4096 bits of data. As the number of parity bits
is 60, the respective factors are 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30,
and 60. By given m*t=60, and m>=13, the possible combination of (m, t)
are (15, 4), (20, 3), (60, 1). While t=4 is a common approach for majority
of the BCH implementation of ECC, the combination of m=15 and t=4 is most
probably. The others two combinations of (20, 3) and (60, 1) are not only
unrealistic, but also terribly overkilled. At this stage, by assuming m=15
and t=4, which primitive polynomial should be selected ? Let's refer to
the primitive polynomial list as stated in [4]. For degree 15, the
candidates are shown below. 

x^15 + x^1 + 1
x^15 + x^4 + 1
x^15 + x^7 + 1
x^15 + x^7 + x^6 + x^3 + x^2 + x^1 + 1
x^15 + x^10 + x^5 + x^1 + 1
x^15 + x^10 + x^5 + x^4 + 1
x^15 + x^10 + x^5 + x^4 + x^2 + x^1 + 1
x^15 + x^10 + x^9 + x^7 + x^5 + x^3 + 1
x^15 + x^10 + x^9 + x^8 + x^5 + x^3 + 1
x^15 + x^11 + x^7 + x^6 + x^2 + x^1 + 1
x^15 + x^12 + x^3 + x^1 + 1
x^15 + x^12 + x^5 + x^4 + x^3 + x^2 + 1
x^15 + x^12 + x^11 + x^8 + x^7 + x^6 + x^4 + x^2 + 1
x^15 + x^14 + x^13 + x^12 + x^11 + x^10 + x^9 + x^8 + x^7 + x^6 + \
x^5 + x^4 + x^3 + x^2+1

Well, the first candidate should be selected, which is

x^15 + x^1 + 1

The polynomial can be represented in binary bit form as mentioned earlier,
which is,

0b1000000000000011

In hex, it is 0x8003, in decimal it is 32771. So, get back to the bchlib,
the BCH_POLYNOMIAL and BCH_BITS, both of them should be set as 32771 and 4,
respectively.

Now, by assuming nobody will naive enough to do BCH encoding without
performing bit order reversing of the entire data input first, let's try
the BCH encoding without any padding for the first "page".

###################### bch_encoding_without_padding.py #######################


import bchlib
import binascii

BCH_POLYNOMIAL = 32771
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

input_file = open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")

page = input_file.read(2112)
ECC = page[2048+32:2048+32+32]

for i in range(0, 4):
      ecc_generated = bch.encode(page[i*512:i*512+512])
      print("\nSub-page: %d" % i)
      print("ECC Ori:", end=' ')
      print(ECC[i*8:i*8+8].hex().upper())
      print("ECC Generated:", end=' ')
      print(ecc_generated.hex().upper())
      if ECC[i*8:i*8+8] == ecc_generated:
            print("Match !")
      else:
            print("Wrong !")
print("\nCompleted")

 
##################################### end ####################################

The output is shown below.

cawan% python3.8 bch_encoding_without_padding.py

Sub-page: 0
ECC Ori: F689F779E560C9E0
ECC Generated: 8DE136AAF3E03F90
Wrong !

Sub-page: 1
ECC Ori: D6E3EDCB9CB0F9F0
ECC Generated: 6C6CF320EFAD8660
Wrong !

Sub-page: 2
ECC Ori: 1FDAD4A49CD41BE0
ECC Generated: 1058EAC213313D70
Wrong !

Sub-page: 3
ECC Ori: E090CC85D8D2E280
ECC Generated: B36A94B537E14BA0
Wrong !

Completed

None of the four "sub-page" generate the correct ECC. So, the "sub-page"
should be padded by a certain number of zero before getting BCH encoded.
Let's try to do BCH encoding by padding the "sub-page" from 1 to 32 bytes
of zeros.

#################### bch_encoding_with_zeros_padding.py ######################


import bchlib
import binascii

BCH_POLYNOMIAL = 32771
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

input_file = open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")

page = input_file.read(2112)
ECC = page[2048+32:2048+32+32]
found_flag = 0

for i in range(0, 4):
      print("\nSub-page: %d" % i)
      print("ECC Ori:", end=' ')
      print(ECC[i*8:i*8+8].hex().upper())
      for j in range(1, 33):
            padding = b'\x00'*j
            ecc_generated = bch.encode(page[i*512:i*512+512]+padding)
            if ECC[i*8:i*8+8] == ecc_generated:
                  print("ECC Generated:", end=' ')
                  print(ecc_generated.hex().upper())
                  print("Match !", end=' ')
                  print("Zeros padded number: %d" % j)
                  found_flag = 1
                  break
      if found_flag == 0:
            print("Wrong !")
      found_flag = 0
print("\nCompleted")


#################################### end ####################################

Let's go and run the check. Hola, the output is interesting, as shown
below.

cawan% python3.8 bch_encoding_with_zeros_padding.py

Sub-page: 0
ECC Ori: F689F779E560C9E0
Wrong !

Sub-page: 1
ECC Ori: D6E3EDCB9CB0F9F0
ECC Generated: D6E3EDCB9CB0F9F0
Match ! Zeros padded number: 24

Sub-page: 2
ECC Ori: 1FDAD4A49CD41BE0
ECC Generated: 1FDAD4A49CD41BE0
Match ! Zeros padded number: 24

Sub-page: 3
ECC Ori: E090CC85D8D2E280
ECC Generated: E090CC85D8D2E280
Match ! Zeros padded number: 24

Completed

So, for those four "sub-page" in a "page", other than the first "sub-page",
the second, third, and fourth "sub-page" are padded with 24 bytes of zeros
before being BCH encoded in order to generate the correct ECC, respectively.
However, the first "sub-page" is still in cryptic, which need to tweak a bit.
Since the rest of the "sub-page" are padded with 24 bytes of zeros, it is
very likely the first "sub-page" is padded with 24 bytes of non-zeros data
then. It should be something related to some kind of "metadata" which is
descriptive to the "page" itself. Remember the first 32 bytes of OOB ?
Let's check it again.

cawan% hexdump -C -v -n $((2112-32)) MT29F2G08ABAEAWP@TSOP48.BIN | tail -n 3
00000800  ff ff 00 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000810  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000820

The two bytes of zeros at 0x802 and 0x803 are a little bit strange. So,
is it possible for the first few bytes of the 24 bytes of zeros padding
are replaced by some bytes from here ? Let's try to replace the 24 bytes
of zeros padding byte by byte, until the entire 24 bytes of padding become

ffff0000ffffffffffffffffffffffffffffffffffffffff

Let's try it.

####################### bch_encoding_of_1st_subpage.py #######################


import bchlib
import binascii

BCH_POLYNOMIAL = 32771
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

input_file = open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")

page = input_file.read(2112)
subpage = page[0:512]
ECC = page[2048+32:2048+32+8]

paddingx = \     
b'\xFF\xFF\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' + \
b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF'

padding0 = \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00'

data_input = subpage + padding0
data_input = bytearray(data_input)
         
for i in range(0, 24):
      data_input[512+i] = paddingx[i]
      ecc_generated = bch.encode(data_input)
      if ecc_generated == ECC:
            print("Match !")
            print("Padding:", end=' ')
            print(data_input[512:].hex().upper())
            break
print("\nCompleted")


#################################### end ####################################

Let's run it. Bingo, the padding pattern found, as shown below.

cawan% python3.8 bch_encoding_of_1st_subpage.py
Match !
Padding: FFFF00000000000000000000000000000000000000000000

Completed


3 - Bit Errors Fixing with ECC

Perfect. Now, the secret association between ECC and data is fully
unveiled. As a conclusion, for each of the "sub-page" in a "page", the
first "sub-page" has to be padded by 24 bytes of padding which comprise
2 bytes of 0xFF following by 22 bytes of zeros, before getting BCH encoded
to generate correct ECC. For the case of second, third, and fourth
"sub-page", only a 24 bytes of all zeros padding is needed to generate
correct ECC, respectively. So, by doing the BCH decoding in the similar
manner to all the "page" of the entire NAND dump, all the bit errors are
getting fixed perfectly. After that, all the 64 bytes OOB in each "page"
should be removed and generating a new NAND dump with contiguous data in
"page" by "page" without any bit errors, and I rename it as
cawan_output.bin, as shown below.

####################### NAND_dump_fix_bit_erros_ecc.py #######################


import bchlib

BCH_POLYNOMIAL = 32771
BCH_BITS = 4

input_file = open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")
output_file = open("./cawan_output.bin", "wb")

pad_sub0 = \     
b'\xFF\xFF\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00'

pad_subx =  \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00'


bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)

count = 0
error_cnt = 0

while 1:
      page = input_file.read(2112)
      if len(page) != 2112:
            break
      for i in range(0, 4):
            data, ecc = page[512*i:512*i+512], page[2048+32+i*8:2048+32+i*8+8]     
            if i == 0:
                  data_padded = data + pad_sub0
            else:
                  data_padded = data + pad_subx
            data_padded = bytearray(data_padded)
            bitflips = bch.decode_inplace(data_padded, ecc)
            if bitflips == 0:
                  output_file.write(data_padded[:512])
            elif bitflips > 0:
                  error_cnt += 1
                  output_file.write(data_padded[:512])
            elif bitflips == -1:
                  output_file.write(data_padded[:512])
      count += 1
print("Sub-page with error count: %d\n" % error_cnt)
print("Completed.")


#################################### end ####################################

Well, there are 20 "sub-page" with bit errors have being fixed with
the ECC, as shown below.

cawan% python3.8 NAND_dump_fix_bit_erros_ecc.py
Sub-page with error count: 20

Completed.

By armed with knowledge, any suitable common tool can be weaponized for
hacking purposes. Don't be silly and get stubborn in believing a
proprietary, special, commercial, or even an automated tool can work as
expected without requiring a single knowledge in the field. So, the
firmware is ready right now, let's proceed to the firmware analysis.


4 - UBI Image Analysis

As a common approach, let's begin with binwalk and expect for gold strikes
or money grow on tree, or both. Let's see the binwalk output as shown below.

cawan% binwalk cawan_output.bin

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
963584        0xEB400         CRC32 polynomial table, little endian
966688        0xEC020         CRC32 polynomial table, little endian
970868        0xED074         LZO compressed data
2097152       0x200000        uImage header, header size: ...
2097216       0x200040        Linux kernel ARM boot executable zImage ...
2115956       0x204974        gzip compressed data, maximum compression, ...
6291456       0x600000        UBI erase count header, version: 1, ...

It looks interesting. As what is stated in the title of this article,
only the UBI image is going to be analyzed. The full description of
the UBI header being detected at address 0x600000 is shown below.

UBI erase count header,
version: 1,
EC: 0x1,
VID header offset: 0x800,
data offset: 0x1000

The header really makes sense with UBI magic at 0x600000, version 1,
the erase count is 1, which mean it is a new NAND flash, or at least
it is just being reformatted. After that, the volume ID header is 0x800
or 2048 in decimal away from 0x600000, which is a common approach for
NAND flash. One important thing to emphasize here. The newly generated
NAND dump is defined as logical NAND dump which is OOB removed and the
size of each "page" is 2048 bytes. So, it is really a common approach in
locating the volume ID header one "page" away from the UBI header. Then,
the actual data is 0x1000 or 4096 in decimal away from the 0x600000,
in other words it is another one "page" away from the volume ID header.
This is also a common approach for NAND flash. So, there is something
as a lunch ? Let's try to extract it with binwalk by passing in the well
known
parameters, -Me. The lengthy output seems convincing. Let's get
into the directory hosting the extracted files, as shown below.

cawan% cd _cawan_output.bin.extracted
cawan% ls
204974  _204974.extracted  600000.ubi  ED074.lzo  ubifs-root

As ubifs-root directory is generated, let's get into the directory.

cawan% cd ubifs-root
cawan% ls
1941946494  3823591600

Another two directory found. Let's check each directory by using tree
command.

cawan% tree -L 2 1941946494
1941946494
 ubifs
     bin
     dev
     etc
     home
     lib
     linuxrc -> bin/busybox
     mnt
     proc
     root
     sbin
     sys
     tmp
     usr
     var
     work

15 directories, 1 file

cawan% tree -L 3 3823591600
3823591600
 app

1 directory, 0 files

Well, it seems the file system is extracted in the directory of
1941946494. However, for 3823591600, it is an empty directory.
Let's go further.

cawan% cd 1941946494
cawan% cd ubifs
cawan% ls
bin  dev  etc  home  lib  linuxrc  mnt  proc  root  sbin  sys  tmp  usr \
var  work
cawan% cd etc
cawan% ls
fstab    HOSTNAME    inittab   pointercal  profile~  ts.conf
group    inetd.conf  networks  ppp         services  vsftpd.conf
gshadow  init.d      passwd    profile     shadow
cawan% cat fstab
cawan% ls -la fstab
-rw-rw-r-- 1 user user 186 Mar 30  2015 fstab
cawan% cat fstab | xxd
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000                 ..........

Well, must be something wrong to the file system extraction. It seems
the free lunch is not really free. Let's go further to find the reason ?
Don't get into mischief, this is really not in the right track for a
hardcore hacker. While talking about analysis, each step of the entire
process should be strictly under control, trackable and explainable,
and it applies to firmware analysis too. Let's start from the beginning
with dd again and craft the UBI image out manually.

cawan% dd if=./cawan_output.bin of=./ubi.bin bs=1 skip=$((0x600000))
262144000+0 records in
262144000+0 records out
262144000 bytes (262 MB, 250 MiB) copied, 281.069 s, 933 kB/s
cawan% file ubi.bin
ubi.bin: UBI image, version 1

It really takes a while to generate ubi.bin. Now, let's verify the UBI
header, volume ID header, and the start of data in hex view.

cawan% hexdump -C -n $((2048*3)) ./600000.ubi
00000000  55 42 49 23 01 00 00 00  00 00 00 00 00 00 00 01  |UBI#............|
00000010  00 00 08 00 00 00 10 00  73 bf c0 7e 00 00 00 00  |........s..~....|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 01 9f 6b b3  |..............k.|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000800  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001800
 
Let's interpret the UBI header with its data structure as shown below.

struct ubi_ec_hdr {
      __be32  magic;
      __u8    version;
      __u8    padding1[3];
      __be64  ec;
      __be32  vid_hdr_offset;
      __be32  data_offset;
      __be32  image_seq;
      __u8    padding2[32];
      __be32  hdr_crc;
}

The header magic is "UBI#" with 4 bytes in size, following by the
version number as 1 which is 1 byte in size. After the 3 bytes of
padding, then it is so called Erase-Counter with abbreviation as ec
which indicate how many times the block has been erased. A little bit
of background knowledge about this which might not be hacker friendly.
The NAND flash storage has a certain number of lifespan. For each time
of erase operation to the same place in the flash, it will reduce the
lifespan. So, once the lifespan count reached, the place becomes useless.
UBI divides the NAND flash storage into "block", which comprise a number
of "page". For the case of MT29F2G08ABAEAWP, a "block" comprises 64
"page" where for each "page" is 2048 bytes in size. So, it is crucial
in monitoring the used count of all the "block" in order to avoid data
loss. Hence, while the used count of a "block" reached a certain number
of triggering level, the entire data in the "block" has to be relocated
to another "block" which is in good condition. While the relocation of
the physical "block" will affect the order or sequence of the "block",
it needs some kind of abstraction to manage the physical "block" in the
logical way. By ensuring the order or sequence of logical "block" in high
level, the logical "block" can particularly being remapped to the
appropriate physical "block" accordingly. Such an abstraction is formally
known as wear-leveling. Well, the so called used count is identical to
erase count in UBI, or worn count in wear-leveling. UBI is responsible
to provide such a wear-leveling mechanism by managing the logical "block"
in the most appropriate way. Let's get back to the 8 bytes of ec item of
the UBI header. The ec is 1 means it is getting formatted for 1 time.
After the ec, it is 4 bytes of volume ID offset from the begining of
UBI header, it is 0x800, which is about 1 "page" size. The volume ID
is followed by data offset in 4 bytes size, it is 0x1000, which is another
1 "page" from the volume ID. Next to the data offset is another 4 bytes to
represent image sequence for identifing the respective UBI block is
belonging to which UBIFS for file system construction. So, the UBIFS is
indeed the actual file system that a hacker should focus on. After that,
there are 32 bytes of padding, and at last, it is the UBI header CRC
checksum in 4 bytes.

Now, let's check how many UBIFS exist in the UBI image.

############################ check_ubifs_count.py ############################


input_file = open("./600000.ubi", "rb")

count = 0
img_seq = b''
tmp_seq = b''

while 1:
      block = input_file.read(2048*64)
      if len(block) != 2048*64:
            break
      if block[0:4] == b'\x55\x42\x49\x23':
            img_seq = block[24:28]
            if img_seq != tmp_seq:
                  print("0x", end='')
                  print(img_seq.hex().upper(), end=' -> ')
                  print("%d" % int(img_seq.hex(),16))
            tmp_seq = img_seq
      count += 1
print("\nCompleted.")

   
#################################### end ####################################

The output is shown below.

cawan%% python3.8 check_ubifs_count.py
0x73BFC07E -> 1941946494
0xE3E760B0 -> 3823591600
0x9F61AB77 -> 2673978231
0x49F558F2 -> 1240815858

Completed.

Sound familiar ? Yes, definitely. 1941946494 and 3823591600 were being
used by binwalk to name the folders to host extracted files. How about
the another two ? That's definitely something wrong in the process while
binwalk extracting the UBI image. Before proceed further, let's try to
estimate the size of data in used in the UBI image. One thing to clarify
first. Whenever an UBI erase block is being in used, it should come with
valid volume ID header, and the magic is "UBI!". Please note that the
term "UBI erase block" is in fact the formal term of logical UBI block.

############################ check_data_inuse.py #############################


input_file = open("./600000.ubi", "rb")

data_inuse = 0
UBI_hdr = b'\x55\x42\x49\x23'
VID_hdr = b'\x55\x42\x49\x21'

while 1:
      block = input_file.read(2048*64)
      if len(block) != 2048*64:
            break
      if block[0:4] == UBI_hdr and block[2048:2048+4] == VID_hdr:
            data_inuse += 2048*64
print("Data size in use: %d" % data_inuse)
print("\nCompleted.")


#################################### end ####################################

The output is shown below.

cawan% python3.8 check_data_inuse.py
Data size in use: 40239104

Completed.

Nice, it is about 40 MB in size, including some extra space which is
hard to estimate precisely. Now, it is time to talk about how to
extract the UBIFS from UBI image. As it is about the matter of
re-arranging the UBI erase blocks according to the image_seq number,
it is no harm to try with a well known toolkit, UBI Reader. Let's see
the result.

cawan% ubireader_extract_images ubi.bin
cawan% ls
cawan_output.bin  ubi.bin  ubifs-root
cawan% cd ubifs-root
cawan% ls
ubi.bin
cawan% cd ubi.bin
cawan% ls
img-1240815858_vol-data.ubifs   img-2673978231_vol-backup.ubifs
img-1941946494_vol-ubifs.ubifs  img-3823591600_vol-app.ubifs
cawan% ls -la
total 145212
drwxrwxr-x 2 user user      4096 May 29 16:46 .
drwxrwxr-x 3 user user      4096 May 29 16:46 ..
-rw-rw-r-- 1 user user 100438016 May 29 16:46 img-1240815858_vol-data.ubifs
-rw-rw-r-- 1 user user  11935744 May 29 16:46 img-1941946494_vol-ubifs.ubifs
-rw-rw-r-- 1 user user  27299840 May 29 16:46 img-2673978231_vol-backup.ubifs
-rw-rw-r-- 1 user user   9015296 May 29 16:46 img-3823591600_vol-app.ubifs

Cool. No error prompt at all and 4 UBIFS getting extracted. Remember
the estimated data in use size is about 40 MB ? It is reasonable to
assume the UBIFS with the name of img-1240815858_vol-data.ubifs is
something wrong. For the rest of 3 UBIFS should be in good condition
because their total size is about 40 MB plus, estimation.

Let's try to use the UBI Reader toolkit again to extract files from UBIFS.
Let's start from img-1941946494_vol-ubifs.ubifs as shown below.

cawan% ubireader_extract_files img-1941946494_vol-ubifs.ubifs
Extracting files to: ubifs-root
decompress Warn: LZO Error: EResult.LookbehindOverrun
_process_reg_file Warn: inode num:693 path:<...> :can't concat NoneType to bytearray
decompress Warn: LZO Error: EResult.InputOverrun
_process_reg_file Warn: inode num:592 path:<...> :can't concat NoneType to bytearray
decompress Warn: LZO Error: EResult.LookbehindOverrun
_process_reg_file Warn: inode num:587 path:<...> :can't concat NoneType to bytearray
decompress Warn: LZO Error: EResult.InputOverrun
...
...
...
cawan% ls
img-1240815858_vol-data.ubifs   img-2673978231_vol-backup.ubifs  ubifs-root
img-1941946494_vol-ubifs.ubifs  img-3823591600_vol-app.ubifs
cawan% cd ubifs-root
cawan% ls
bin  dev  etc  home  lib  linuxrc  mnt  proc  root  sbin  sys  tmp  usr  \
var  work

After getting a huge number of error prompt, it seems a file system is
generated. Is that the same thing as what was being generated by binwalk
earlier ? Let's check.

cawan% cd etc
cawan% cat fstab
cawan% ls -la fstab
-rw-rw-r-- 1 user user 186 Mar 30  2015 fstab
cawan% cat fstab | xxd
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000                 ..........

Damn, it is indeed the same thing. It seems the ubireader_extract_files
is unable to fully interpret the UBIFS and generate correct files. How
about the others two UBIFS ? Let's check.

cawan% ubireader_extract_files img-2673978231_vol-backup.ubifs
Extracting files to: ubifs-root
index Fatal: LEB: 110 at 13998336, Node size smaller than expected.
cawan% ubireader_extract_files img-3823591600_vol-app.ubifs
Extracting files to: ubifs-root
index Fatal: LEB: 58 at 7461120, Node size smaller than expected.

Sorry, fatal error this time, nothing generated. Since this is the NAND
dump from a real device which is fully functional, and all the bit errors
have being fixed, the UBIFS should work accordingly. It should proceed
in another route by emulating the NAND chip to work associated with MTD
by using nandsim. In most of the hacking literature, while talking about
nandsim, a standard conventional approach is dd the entire UBI image into
the emulated MTD device by nandsim, and modprobe the ubi driver with some
parameters, and the ubi driver is on its own to deal with the UBI image
blob. Let's put a few words of comment about this. As what mentioned
earlier, UBI erase block is purposely for wear-leveling implementation
in UBI layer. Since the UBI erase block is in logical form, they are
normally not in sequence physically, which is the case of the NAND dump.
So, instead of relying the UBI driver to work extra for block remapping
operation, which might have high chance in causing errors in all the
regards under emulation mode, it is better to pre-process the UBI image
in offline mode by using ubireader_extract_images first. The output of
ubireader_extract_images is already in UBIFS form, which is the actual
file system like squashfs, jffs2, yaffs2, or CRAMFS do. In other words,
by dealing with UBIFS directly, the chance of getting errors will get
minimized. Anyway, it is no harm to go with the standard conventional
approach first. Let's get started to grab the low-hanging fruit. In order
to emulate a NAND chip, one should get know the ID codes of the chip.
By referring to the datasheet of MT29F2G08ABAEAWP, the first 4 bytes are
0x2c, 0xda, 0x90, and 0x95. With such an info, it is ready for nandsim.

cawan% sudo modprobe nandsim first_id_byte=0x2c second_id_byte=0xda
                   third_id_byte=0x90 fourth_id_byte=0x95  
cawan% cat /proc/mtd
dev:    size   erasesize  name
mtd0: 10000000 00020000 "NAND simulator partition 0"
cawan% sudo mtdinfo -a
Count of MTD devices:           1
Present MTD devices:            mtd0
Sysfs interface supported:      yes

mtd0
Name:                           NAND simulator partition 0
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          2048 (268435456 bytes, 256.0 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  512 bytes
OOB size:                       64 bytes
Character device major/minor:   90:0
Bad blocks are allowed:         true
Device is writable:             true

Since it is assumed as low-hanging fruit for now, just ignore the
parameters shown first. Now, let's dd the UBI image into /dev/mtd0.

cawan% sudo dd if=ubi.bin of=/dev/mtd0 bs=2048
128000+0 records in
128000+0 records out
262144000 bytes (262 MB, 250 MiB) copied, 2.7339 s, 95.9 MB/s

Done. Now, modprobe the ubi driver.

cawan% sudo modprobe ubi mtd=0,2048
modprobe: ERROR: could not insert 'ubi': Invalid argument

Sorry, the low-hanging fruit is in fact not so low for this NAND dump.
Let's proceed in the proper way as what being proposed earlier. Let's
start again from the beginning, by rmmod the nandsim first and modprobe
the nandsim again.

cawan% sudo rmmod nandsim
cawan% sudo modprobe nandsim first_id_byte=0x2c second_id_byte=0xda
       third_id_byte=0x90 fourth_id_byte=0x95
      
Well, nothing special here. The output of mtdinfo -a is nothing special
also because it is just about the parameters of MT29F2G08ABAEAWP. The
only thing that need to make sure is the /dev/mtd0 is created. After that,
use ubiformat with correct parameters to bring up the emulated NAND flash
as UBI compatible with the UBI specification being used in the NAND dump,
as shown below.

cawan% sudo ubiformat -s 2048 -O 2048 /dev/mtd0
ubiformat: mtd0 (nand), size 268435456 bytes (256.0 MiB), \
2048 eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes
libscan: scanning eraseblock 2047 -- 100 % complete 
ubiformat: 2048 eraseblocks are supposedly empty
ubiformat: formatting eraseblock 2047 -- 100 % complete 

Let's explain the two compulsory input parameters of ubiformat. The -s is
also known as sub-page-size, which is the minimum i/o unit used for UBI
headers. By setting it as 2048, it prevents the UBI from dividing the
entire 2048 bytes into smaller unit of sub-page. Next, the -O is volume ID
header offset. By setting it as 2048, it means the volume ID header should
start 1 page or 2048 bytes away from the start of the UBI erase block.
Please note that without specifying these two parameters with the correct
figures, or leave everything by default, it will end-up with errors in the
following steps. Let's proceed further to modprobe the UBI driver.

cawan% sudo modprobe ubi
cawan%

No error prompt, just assume it is succeeded. Now, use ubiattach to create
a UBI device file which work associated with /dev/mtd0, as shown below.

cawan% sudo ubiattach -p /dev/mtd0 -O 2048
UBI device number 0, \
total 2048 LEBs (260046848 bytes, 248.0 MiB), \
available 2002 LEBs (254205952 bytes, 242.4 MiB), \
LEB size 126976 bytes (124.0 KiB)

Again, the input parameter of -O 2048 is crucial to specify the volume
ID header offset as 2048 bytes away from the UBI eraseblock, which is
similar to ubiformat. It is extremely important to make sure the Logical
Eraseblock (LEB) size is 126976 bytes. Why ? Because a eraseblock size
is 2048*64=131072, and after deducting 2 pages with the size of 2048
bytes each (one for UBI header and one for volume ID header) from it,
then the LEB size becomes 131072-2048-2048=126976. So, they match each
other. Otherwise, it will end-up with errors in the following step also.
A new UBI device file is created as /dev/ubi0, which can check its
details by using ubinfo, as shown below.

cawan% sudo ubinfo /dev/ubi0 -a
ubi0
Volumes count:                           0
Logical eraseblock size:                 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks:     2048 (260046848 bytes, 248.0 MiB)
Amount of available logical eraseblocks: 2002 (254205952 bytes, 242.4 MiB)
Maximum count of volumes                 128
Count of bad physical eraseblocks:       0
Count of reserved physical eraseblocks:  40
Current maximum erase counter value:     0
Minimum input/output unit size:          2048 bytes
Character device major/minor:            237:0

Now, a UBI environment which is having exactly the same specification
with the UBI image in the NAND dump is getting ready. Let's create a
volume with sufficient storage to host the UBIFS being created by
ubireader_extract_images, as shown below.

cawan% sudo ubimkvol -N volume1 -s 50MiB /dev/ubi0
Volume ID 0, \
size 413 LEBs (52441088 bytes, 50.0 MiB), \
LEB size 126976 bytes (124.0 KiB), dynamic, name "volume1", alignment 1

Well, a new volume named as "volume1" with 50 MB in size has been
created successfully, together with a new device file as /dev/ubi0_0,
by using ubimkvol. Now, it is time to let volume1 to host a UBIFS by
using ubiupdatevol. Let's start with img-1941946494_vol-ubifs.ubifs
first, as shown below.

cawan% ls -la
total 145216
drwxrwxr-x 3 user user      4096 May 30 01:40 .
drwxrwxr-x 3 user user      4096 May 29 16:46 ..
-rw-rw-r-- 1 user user 100438016 May 29 16:46 img-1240815858_vol-data.ubifs
-rw-rw-r-- 1 user user  11935744 May 29 16:46 img-1941946494_vol-ubifs.ubifs
-rw-rw-r-- 1 user user  27299840 May 29 16:46 img-2673978231_vol-backup.ubifs
-rw-rw-r-- 1 user user   9015296 May 29 16:46 img-3823591600_vol-app.ubifs
drwxrwxr-x 2 user user      4096 May 30 01:40 ubifs-root
cawan% sudo ubiupdatevol /dev/ubi0_0 img-1941946494_vol-ubifs.ubifs
cawan%

5 - Firmware Extraction

Everything works perfectly without any single error so far. Let's see
the low-hanging fruit which is not so low is available now or not.

cawan% mkdir /tmp/nand
cawan% sudo mount -t ubifs /dev/ubi0_0 /tmp/nand
cawan% cd /tmp/nand
cawan% ls
bin  dev  etc  home  lib  linuxrc  mnt  proc  root  sbin  sys  tmp  usr \
var  work

Hopefully this is not the same thing as what ubireader_extract_files
generates in the previous section. Let's verify it.

cawan% cd etc
cawan% cat fstab
proc  /proc      proc    defaults     0      0
none  /var/shm   shm     defaults     0      0
sysfs /sys       sysfs   defaults     0      0
none  /tmp   tmpfs     defaults     0      0
 
what an amazing moment. Let's try with another two UBIFS.

cawan% sudo umount /tmp/nand
cawan% sudo ubiupdatevol /dev/ubi0_0 img-2673978231_vol-backup.ubifs
cawan% sudo mount -t ubifs /dev/ubi0_0 /tmp/nand                   
cawan% ls /tmp/nand
14x8.hzk   dat.ini         flat_backup  libplat.so   ParaAutoNet.db  ParamUniq.db
acmet      driver_gwzd.ko  gsmMuxd      lyzd         ParamMeter.db   ppp
check.ini  factory         icons.bmp    manuf.xin    ParamOther.db   seting.ini
chs.bin    filecheck       libacmet.so  metproto.so  ParamTerm.db    startup.sh
cawan% sudo umount /tmp/nand
cawan% sudo ubiupdatevol /dev/ubi0_0 img-3823591600_vol-app.ubifs  
cawan% sudo mount -t ubifs /dev/ubi0_0 /tmp/nand
cawan% ls /tmp/nand
14x8.hzk   driver_gwzd.ko  libacmet.so  manuf.xin    startup.sh
check.ini  filecheck       libplat.so   metproto.so  tmt_info.log
chs.bin    gsmMuxd         lyzd         ppp          updateinfo.xin
dat.ini    icons.bmp       lyzd.xzip    seting.ini

6 - Conclusion

So, as a conclusion, the entire file system hosting in three different UBIFS
have been fully extracted successfully.

Happy hacking, and keep hacking.


References:-

[1] MT29F2G08ABAEAWP Data Sheet, https://datasheet.lcsc.com/
    lcsc/1811032117_Micron-Tech-MT29F2G08ABAEAWP-E_C110895.pdf
   
[2] DumpFlash Tool, https://github.com/ohjeongwook/dumpflash

[3] python-bchlib, https://github.com/jkent/python-bchlib

[4] Primitive Polynomial List, https://www.partow.net/programming/polynomials/
    index.html
   
[5] UBI Header Structure, https://kernel.googlesource.com/pub/scm/linux/kernel/
    git/rw/mtd-utils/+/refs/heads/master/include/mtd/ubi-media.h