Hi guys thanks for your posts.
Basically I am using a project called Jasmine which is part of something called OpenSSD. It just happens luckily that the makers of the controller released some source code for the chip, tutorial firmwares and an installer program which can upload firmware to the SSD. It was made for a different set of NANDs than I have but similar enough that I believe I have adjusted the parameters to suit, although I am missing stuff like how to read the bad block info from each spare area. This is a PITA but not seeming like a show stopper.
You can look at the source of 'installer.c' from OpenSSD jasmine on the web. It sets up the controller via a function called reset_fctrl, which is one place I have spent a lot of time at. Because the documentation is incomplete I don't quite know how to set the barefoot up for my nands, but I am good enough to get a dump (tiral and error with NAND config).
In this function you set info like nand page size, mlc / slc, planes, dies, block size, speed etc. This is passed over SATA to the controller which is running its inbuilt rom firmware. The controller then accepts commands to read blocks from the nand and pass them back over SATA. The way it reads the nands depends on how you set it up in reset_fctrl. So when you get it right, the interleaving and bank distribution is done by the controller, leaving you with large chunks of data. I have it now returning 16kb pages which seems right, since the nands are 4kb pages, 2 die, 2 plane. The SSD reports as having 32 banks, and looking through the code I am pretty convinced this is set up right now.
So then you have 'bank_read' function. Here you give it a bank/row address and number of bytes to read. After a read you can query the ecc status of the read. I can read 16k with successful ecc just about all the time apart from some blocks which are probably bad blocks (the bad block info at each banks block 0 is partially corrupt, and I can't scan the end of data block bad block markers because I don't have the documentation to adjust the code to suit my nands - so far).
So hence the join by byte, and that stuff (I'm not 100% up with the terminology), seems to be not needed. The XORing, to my surprise, seems not to be needed. I have code to set the XOR registers in the Jasmine source but its not implemented AFAIK. I also have the keys from another source. However, with html pages from the disk readable as things are, I feel its not required. Its possible that the ROM has implemented it, but I don't think so. This drive has original very old firmware on it and I suspect they never implemented the XORing on this firmware release (I got the drive 2009 IIRC).
When you use bank_read, you set certain parameters during each read. You can ask for ECC or not, set bank, column, set bytes to read, set 2 plane interleave etc. There is reasonable documentation for this operation.
So anyway I use bank_read and read 16k from each bank in sequence. This gives me what I believe is a virtual image. I see whole 16k pages which I can tell are written at once. Whether the original OS was writing 16kb or this is 2 8kb pages written sequentially I don't know and probably don't care, as thats for the file system to sort out.
Anyway that nearly brings me up to date, but for one thing. Now I have some logical page mapping info. The last 16kb page in each block of 128 pages in each bank holds data which I noticed a pattern in, and it looked a lot like LPAs. It looked a lot like 32 bit words and the first 6 words looked different to the next 128. So I tried using the 128 words which look like LPAs. I tried using the first word of the 128 list as the LPA of the first page in that block in that bank, word 2 as the LPA of the second page etc. It assembled to an image which has pages of html and other text longer than 16kb, and there are many of them. With a html page you can of course see if the pages do belong together and they do. Once I had seen several long pages reassembled together like this I believed that these 32 bit words were indeed the LPAs.
However the problem is as I said before, there are duplicate LPAs. Yesterday I did more coding to try to reduce the duplicates. I looked at the first 6 words of the mapping blocks (the ones I skipped, before the LPA words). They look like this:
Code:
7f0018 7f ffffffff ffffffff ffffffff 7fffffff
7f001f 7f ffffffff ffffffff ffffffff 7fffffff
Looking at the second word, 7f above, it looked interesting, so I examined what values were put in that place and how common they were. On the left here is the value from word 2, on the right is how many times I found it scanning several GB of the disk:
Code:
0 31
79 121
70 2
63 1
75 1
77 9
7f 27427
72 1
73 1
5c 1
1dcf32af 31
76 4
7c 512
5d 1
6c 1
7a 144
ffffffff 103
78 42
6e 1
7d 1037
7e 2234
7b 292
55 1
74 2
7f was very common so I wondered if that marked data blocks. Some of those others might be erased blocks, bad blocks, log blocks, other deliberate blocks and whatever else I don't know. But at the moment I am going ahead assuming the 7f marks a good block, and will try looking at whats in the other value marked blocks another time. Take the low lying fruit first.
Still I have a proportion of duplicate LPNs. Not many, but the question is how to resolve them. I have been looking for a sequence number, for instance. I thought maybe the word which would be used for LPN 128, which is useless since page 128 is the map page in each block, might be a sequence number, but there are duplicates of this number on the disk (IE its not unique so it can't be a sequence I imagine).
I think I will continue in another post.