All times are UTC - 5 hours [ DST ]


Forum rules


Please do not post questions about data recovery cases here (use this forum instead). This forum is for topics on finding new ways to recover data. Accessing firmware, writing programs, reading bits off the platter, recovering data from dust...



Post new topic Reply to topic  [ 11 posts ] 
Author Message
 Post subject: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: February 13th, 2022, 15:52 
Offline
User avatar

Joined: February 13th, 2022, 15:26
Posts: 5
Location: Switzerland
Hey everyone, first post here! I just found out about your forum, this place looks awesome 8).

I have acquired an old Tektronix device. I'd like to replace its faithful 30 year-old Seagate drive for obvious reasons :wink:. The device is from around 1993, it is m68k based and runs pSOS as an operating system. The original hard drive - which still works without issue by the way - is a 2.5" 128MB Seagate ST9140AG (980 cyl, 15 heads, 17 sect). I bought a Transcend Industrial CF300 128MB compact flash and a CF-to-IDE44 adapter as a replacement (to be used in True ATA mode). The CF/adapter setup works fine on an oldish 2010 motherboard running Linux. It is not however correctly recognized by the Tektronix device.

The Tektronix device sees it, but it says that it is a 22MB Hard Disk (and not 128MB as expected). The Tektro also has a Format utility which simply fails:

Quote:
"The hard disk format routine could not successfully format the hard disk. The boot sector could not be updated. Any data remaining on the hard disk may be corrupt. Run the hard disk diagnostics for more information".


FWIW, the Tektro's HD diagnostics utility passes. Although I'm not sure that we should take this generic error message to the letter.

I'm therefore learning about the world of old hard drives (pre and early ATA-1). At first, I suspected a variable overflow in the OS driver but I don't see what could cause 128MB to become 22MB. Right now I'm learning about CHS/LBA translation. But to me, the size of the disk should be reported during the IDENTIFY ATA command, so I'm not sure how it can get wrong.

Before I set up a logic analyzer on the beast to sniff ATA commands (I'm a bit lazy but I will eventually get to it :D), I figured someone here may have an interesting take on this problem :). Any luck?


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: February 14th, 2022, 13:42 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 15461
Location: Australia
Can you show us a dump of the Identify Device data block? Words 1 - 6 and 53 - 61 report the default and current CHS translations and CHS/LBA capacities.

https://docs.rs-online.com/1adf/0900766b816b6fd0.pdf (TS128MCF300 datasheet)
https://web.archive.org/web/20120321035657if_/http://www.t13.org/Documents/UploadedDocuments/project/d0791r4c-ATA-1.pdf (ATA standard 1994)

In these old systems the drive powers up with a default CHS translation. The BIOS then issues an Initialize Drive Parameters command (91h) which selects the desired CHS translation mode. This is so the drive can be addressed with one of the fixed CHS configurations present in the BIOS. This new configuration is volatile, so the drive will revert to its default translation mode after power cycling. You should be able to determine the scope's preprogrammed translation mode by examining the partition table or boot sector of the Seagate HDD.

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: February 14th, 2022, 19:01 
Offline
User avatar

Joined: February 13th, 2022, 15:26
Posts: 5
Location: Switzerland
Quote:
Can you show us a dump of the Identify Device data block?

There you go, dumped using hdparm on my Linux box:

st9140ag
Code:
00000000: 045a 03d4 0000 000f 8d90 0248 0011 0000  .Z.........H....
00000010: 0000 0000 2020 2020 2020 2020 2020 3030  ....          00
00000020: 4b51 3939 3130 3537 0003 00f0 0010 3131  KQ991057......11
00000030: 2e31 312e 3031 5354 3931 3430 4147 2020  .11.01ST9140AG 
00000040: 2020 2020 2020 2020 2020 2020 2020 2020                 
00000050: 2020 2020 2020 2020 2020 2020 2020 0000                ..
00000060: 0000 0000 0000 0000 0000 0001 03d4 000f  ................
00000070: 0011 d02c 0003 0000 0000 0000 0000 0000  ...,............
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000100: 7f20 003f 0000 0000 0000 0000 0000 0000  . .?............
00000110: ff18 0024 0021 001e 001c 0018 0018 0000  ...$.!..........
00000120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000140: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000150: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

Word 1: 0x03d4: 980 cylinders
Word 3: 0x000f: 15 heads
Word 4: 0x8d90: 36240 unformatted bytes per track
Word 5: 0x0248: 584 unformatted bytes per sector
Word 6: 0x0011: 17 sectors per track

ts128mcf300
Code:
00000000: 044a 00f6 0000 0010 0000 0000 003f 0003  .J...........?..
00000010: c8a0 0000 4734 3230 3435 3031 3036 3733  ....G42045010673
00000020: 3337 3744 3030 3237 0002 0002 0004 3230  377D0027......20
00000030: 3137 3034 3230 5453 3132 384d 4346 3330  170420TS128MCF30
00000040: 3020 2020 2020 2020 2020 2020 2020 2020  0               
00000050: 2020 2020 2020 2020 2020 2020 2020 8001                ..
00000060: 0000 0f00 0000 0200 0000 0007 00f6 0010  ................
00000070: 003f c8a0 0003 0100 c8a0 0003 0000 0007  .?..............
00000080: 0003 0078 0078 0078 0078 0100 0000 0000  ...x.x.x.x......
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 742b 7505 4000 0429 3401 4000  ....t+u.@..)4.@.
000000b0: 203f 0003 0000 0000 fffe 604f 0000 0000   ?........`O....
000000c0: 0000 0000 0000 0000 c8a0 0003 0000 0000  ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0010  ................
000000f0: 0010 0000 0000 0000 0000 0000 0000 0000  ................
00000100: 0001 5452 414e 5343 454e 4400 0000 0000  ..TRANSCEND.....
00000110: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000140: 81f4 0000 0000 0012 0000 0000 0000 6100  ..............a.
00000150: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001d0: 0000 0000 0008 0008 0000 0000 0000 0000  ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

Note that this information can also be found in the CF300 datasheet under "ID Table Information of True IDE Mode", page 76.

Word 1: 0x00f6: 246 cylinders
Word 3: 0x0010: 16 heads
Word 4: 0x0000: 0 unformatted bytes per track
Word 5: 0x0000: 0 unformatted bytes per sector
Word 6: 0x003f: 63 sectors per track

We can see that the CF300 follows a recent ATA spec that considers words #4 and #5 as obsolete. Whereas ATA1 used them to convey some information. I'm not sure what "unformatted bytes per track/sector" means, but perhaps my Tektronix uses these?

Quote:
You should be able to determine the scope's preprogrammed translation mode by examining the partition table or boot sector of the Seagate HDD.

Here are the first two sectors of the Seagate HDD (sorry, the forum does not use scrollbars so this will be quite long). Remember that this is not an x86 architecture, but m68k with a specific RTOS named pSOS. There's no BIOS here. For example, I can't find the 0x55 0xaa signature at the end of the alleged MBR. Perhaps I can find something inside the Tektronix or pSOS documentation.

Code:
00000000: eb46 9054 454b dc1f 4d84 8d00 0240 0100  .F.TEK..M....@..
00000010: 02f0 05d4 03f0 1000 1100 0f00 0000 0000  ................
00000020: 0000 0000 426f 6f74 2064 6973 6b20 666f  ....Boot disk fo
00000030: 7220 5465 7874 726f 6e69 7820 7878 7878  r Textronix xxxx
00000040: 206f 6e6c 792e 0000 fab8 c007 8ed0 bc00   only...........
00000050: 048e d8be 2400 bb07 00b9 2300 acb4 0ecd  ....$.....#.....
00000060: 10e2 f9fb ebfe 0000 0000 0000 0000 0000  ................
00000070: 0000 1000 5359 5354 454d 2020 5445 4b53  ....SYSTEM  TEKS
00000080: 5953 5445 4d20 2042 414b 0000 0000 0000  YSTEM  BAK......
00000090: 0000 0000 0000 0000 206f 0004 2068 0814  ........ o.. h..
000000a0: 4ed0 4e71 4e71 4e71 206f 0004 21c8 0198  N.NqNqNq o..!...
000000b0: 48e7 f040 4280 3140 001c 3140 0018 21e8  H..@B.1@..1@..!.
000000c0: 0004 028c 243c 001a 4eea 1239 0085 001d  ....$<..N..9....
000000d0: 0281 0000 00c0 0c01 0040 6700 0012 5382  .........@g...S.
000000e0: 66e8 13f8 00ff 0085 000f 6000 0114 13e8  f.........`.....
000000f0: 0010 0085 0005 13e8 0016 0085 0007 13e8  ................
00000100: 0015 0085 0009 13e8 0014 0085 000b 1028  ...............(
00000110: 0012 0280 0000 000f 0040 00a0 13c0 0085  .........@......
00000120: 000d 1028 0013 0c00 002c 6700 0034 1028  ...(.....,g..4.(
00000130: 0013 0c00 00ec 6700 0034 0c00 0034 6700  ......g..4...4g.
00000140: 0038 1028 0013 0c00 0050 6700 0038 13fc  .8.(.....Pg..8..
00000150: 0000 0085 0005 13c0 0085 000f 6000 00a2  ............`...
00000160: 13fc 0020 0085 000f 6000 0060 13fc 00ec  ... ....`..`....
00000170: 0085 000f 6000 0054 13fc 0030 0085 000f  ....`..T...0....
00000180: 6000 000e 13fc 0050 0085 000f 6000 0002  `......P....`...
00000190: 1239 0085 000f 4eb9 0001 5c00 2268 0008  .9....N...\."h..
000001a0: 1628 0010 0c03 0002 6c00 000c 4eb9 0001  .(......l...N...
000001b0: 5c38 6000 004c 4eb9 0001 5ca8 5303 6f00  \8`..LN...\.S.o.
000001c0: 0040 4eb9 0001 5c00 60da 2268 0008 1628  .@N...\.`."h...(
000001d0: 0010 0c83 0000 0002 6c00 0012 4eb9 0001  ........l...N...
000001e0: 5c00 4eb9 0001 5d16 6000 0016 4eb9 0001  \.N...].`...N...
000001f0: 5c00 4eb9 0001 5cde 5303 6f00 0004 60d2  \.N...\.S.o...`.
00000200: f0ff ffff 0300 0400 0500 0600 ffff ffff  ................
00000210: ffff ffff ffff ffff ffff 0e00 ffff ffff  ................
00000220: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000230: ffff ffff ffff 1c00 ffff ffff ffff ffff  ................
00000240: 2100 ffff ffff 2400 ffff 2600 ffff ffff  !.....$...&.....
00000250: 2900 2a00 2b00 ffff ffff ffff ffff ffff  ).*.+...........
00000260: ffff ffff ffff ffff ffff 3600 3700 ffff  ..........6.7...
00000270: ffff 3a00 3b00 3c00 ffff 3e00 3f00 ffff  ..:.;.<...>.?...
00000280: 4100 4200 4300 ffff ffff 4600 ffff 4800  A.B.C.....F...H.
00000290: 4900 4a00 ffff 4c00 4d00 ffff ffff 5000  I.J...L.M.....P.
000002a0: 5100 ffff 5300 ffff 5500 ffff ffff 5800  Q...S...U.....X.
000002b0: 5900 5a00 5b00 ffff 5d00 5e00 5f00 ffff  Y.Z.[...].^._...
000002c0: ffff 6200 6300 ffff 6500 ffff 0000 0000  ..b.c...e.......
000002d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000300: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000310: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000320: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000330: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000340: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000350: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000360: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000370: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000380: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000390: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................


Thanks for your precious help.


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: February 15th, 2022, 14:25 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 15461
Location: Australia
Offset 0x200 appears to be the first sector of a 16-bit FAT. You can see the cluster chains, eg cluster #0002 is followed by cluster #0003, then #0004, #0005, #0006 and #FFFF.

Offset 0x00 is a FAT-like boot sector. If we disassemble the code, we find that it's Intel x86. There is a BIOS interrupt call (INT 10, AH = 0E) which is a call to a video interrupt handler.

Code:
-u 100
0AD0:0100 EB46              JMP     0148
0AD0:0102 90                NOP

-u 148
0AD0:0148 FA                CLI
0AD0:0149 B8C007            MOV     AX,07C0
0AD0:014C 8ED0              MOV     SS,AX
0AD0:014E BC0004            MOV     SP,0400
0AD0:0151 8ED8              MOV     DS,AX
0AD0:0153 BE2400            MOV     SI,0024
0AD0:0156 BB0700            MOV     BX,0007
0AD0:0159 B92300            MOV     CX,0023
0AD0:015C AC                LODSB
0AD0:015D B40E              MOV     AH,0E
0AD0:015F CD10              INT     10
0AD0:0161 E2F9              LOOPW   015C
0AD0:0163 FB                STI
0AD0:0164 EBFE              JMP     0164
-

There appears to be a BIOS Parameter Block at offsets 0x0B - 0x1B.

https://thestarman.pcministry.com/asm/mbr/DOS50FDB.htm

Code:
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000                                   00 02 40 01 00
00000010  02 F0 05 D4 03 F0 10 00 11 00 0F 00

    512 Bytes per Sector
    64 Sectors per Cluster ??
    1 Reserved sector (starting at Sector 0)
    2 FATs
    0x5F0 possible Root Directory entries ??
    980 (= 0x3D4) Cylinders
    0xF0 Media Descriptor Byte
    16 (= 0x0010) Sectors per FAT ?
    17 (= 0x0011) Sectors per Track
    15 (= 0x000F) Heads

Therefore it would appear that Tek uses the default CHS translation parameters when formatting the drive:

https://stason.org/TULARC/pc/hard-drives-hdd/seagate/ST9140AG-128MB-2-5-SSL-IDE-AT.html

Your CF card has the same CHS defaults.

In short, I have no idea where the 22MB capacity is coming from. :-?

Have you tried cloning the HDD to the CF card?

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: February 15th, 2022, 15:43 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 15461
Location: Australia
fzabkar wrote:
Your CF card has the same CHS defaults.

Oops, I don't know where I was looking. Obviously the CF card's CHS defaults are different (246/16/63)

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: February 15th, 2022, 16:35 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 15461
Location: Australia
I suspect that the scope displays the following message during booting:

"Boot disk for Textronix xxxx only."

If the cloned CF card causes the device to display the same message, but with errors, then this would suggest that the IDE interface is dropping bits. It would also explain why the capacity is being reported incorrectly.

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: February 16th, 2022, 13:44 
Offline
User avatar

Joined: February 13th, 2022, 15:26
Posts: 5
Location: Switzerland
Indeed, you are right. This is an x86 boot sector. The PRISM OS (the pSOS-based OS that is in use on this machine - a Tektronix GPX3001) uses an MS-DOS-compatible format for its HDD and floppies. To quote the documentation:

Quote:
PRISM files are stored on disk in MS-DOS format. This allows you to share files (and floppy disks) between your mainframe and personal computers running the MS-DOS operating system. While you can manipulate PRISM files with your personal computer, it is important to realize that MS-DOS is not the operating system used by PRISM. The following paragraphs discuss the differences between the PRISM file system conventions and those of MS-DOS file system.

NOTE
When you insert a new unformatted floppy in the disk drive, you must format it before you can perform any other disk operation on it.

Your mainframe cannot read from or write to a floppy unless it has been initialized with the MS-DOS format. This format lets you store 720 kilobytes of This information on each floppy disk. One of the major differences between the PRISM file system and MS-DOS implementations is that directories cannot be nested. In other words, PRISM does not allow you to put a directory inside a directory. If you use your personal computer to nest directories on a disk, you cannot access the subdirectories with PRISM.

NOTE
When looking at disk contents with PRISM utilities, you cannot see the files in the ROOT directory. However, you can see them if you list disk contents on an MS-DOS compatible computer.

Another difference between PRISM files and other MS-DOS files is that PRISM has its own file types. When a listing of files is viewed in a PRISM menu, the file type is spelled out alongside the file name (see the Disk Services menu shown in Figure 6-1). When viewed on a personal computer, the file type is identified by a three-character extension appended onto the file name.


The x86 instructions that you found in the boot sector are there in case someone attempts to boot one of these disks on something other than the Tektronix. In that case, the message "Boot disk for Textronix xxxx only." will be shown, indicating the user that he's attempting to boot from the wrong disk.

In fact, the Tektro never boots from the hard drive. The M68010 boots off a bootrom (in reality two physical ROMs: one for even addresses, the other for odd addresses). The bootrom has routines for accessing the hard drive and loading the OS. I did have some fun recreating the bootrom image by combining both ROM dumps together and running it under Qemu. I hacked together a quick and dirty display peripheral to intercept calls to the framebuffer. That way, I could emulate stuff and see the boot screen (for anyone interested, my partial and not very useful work is here: https://github.com/zkrx/gpx3001/commits/gpx3001). Anyway, I digress :). FTR, you can find its documentation on archive.org: https://archive.org/details/tektronix_gpx3001

This BIOS Parameter Block is interesting stuff that I did not know about. I will attempt to duplicate the Seagate partition onto the CF300 and see what happens. I'll keep you posted. Thanks again!


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: February 16th, 2022, 14:13 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 15461
Location: Australia
eminem wrote:
In fact, the Tektro never boots from the hard drive. The M68010 boots off a bootrom (in reality two physical ROMs: one for even addresses, the other for odd addresses). The bootrom has routines for accessing the hard drive and loading the OS.

That makes sense. The boot code displays the text message and then goes into an infinite loop.

Code:
0AD0:0164 EBFE              JMP     0164

Is it possible to boot from the CF card just to see whether the text is displayed correctly? I suspect that cloning the HDD won't work because of the different CHS translations, but sector #1 (CHS = 0/0/1) should still be in the same place.

Does Prism OS have a debugger like DEBUG.EXE in MS-DOS? You could use this to dump the contents of logical sector 0 to the screen.

Edit: After you clone the HDD to your CF card, would it make sense to edit the CHS values in the BPB to match the CF card's translation mode?

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: August 8th, 2022, 8:59 
Offline
User avatar

Joined: February 13th, 2022, 15:26
Posts: 5
Location: Switzerland
Hi fzabkar,

I had time to spend on this project over the last month. I spent a lot of time doing various experiments and analyzing the bootloader of that thing. I did a lot of work so I'll try to keep things brief and simple.

As you suggested in your last post, it is necessary to edit the CHS values in the BPB to match the newly-installed drive. Once this is done, the error message is no longer displayed. We get a neat "Executing Code Loaded From Disk" and I even get a nice Tektronix copyright string. However, it takes a couple of minutes to get there and the boot process stalls before reaching the OS (or rather while initializing the OS). Which gets me to the following point.

I took a lot of captures of the ATA bus with a 16-channel logic analyzer (DSLogic). At first, it was difficult to make sense of it (spec isn't crystal-clear about edge polarity to say the least and timings are rather short for 20MHz-analysis). But now that all is clear, I found one thing that stands out.

First, a note on how I'll decode ATA addresses in this text. I find it easier to use one hex digit for CS0- and CS1-, and another for DA2, DA1, DA0. Here is how I set the decoder in DSLogic to illustrate:

Attachment:
addr_decoding.png
addr_decoding.png [ 22.89 KiB | Viewed 9432 times ]


Also note that I only capture the data bus LSB (DD0 to DD7) and ignore the MSB (DD15 to DD8) as my analyzer does not have enough inputs. This is not really a limitation as most register transfers are 8-bit wide (aside from the Data register which is 16-bit wide).

To get data from the hard drive, the bootloader initiates a PIO read of one or more sectors by setting up the Sector Count (0x12), Sector Number (0x13), Cylinder Low (0x14), Cylinder High (0x15) and Device Drive/Head (0x16) registers and then writing 0x20 "Read sector(s) (w/retry)" to the ATA command register. It then waits for the INTRQ pin to be asserted by the drive while polling the Alternate Status (0x26) register.

Attachment:
seagate_read_start.png
seagate_read_start.png [ 100.33 KiB | Viewed 9432 times ]


Once the interrupt is asserted by the drive (meaning that one or more 512-byte sector is ready to be fetched), the bootloader reads the Alternate Status (0x26) register once more, reads the regular Status (0x17) register to clear the interrupt, reads the Error (0x11) register to make sure that everything is fine, then the Sector Count (0x12) register to get how many sectors are to be fetched (important, more on that below) and finally the Alternate Status (0x26) register one last time. It then proceeds to read one full 512-byte sector by polling the Data (0x10) register 256 times. Once the entire sector has been read, the bootloader waits for the drive to raise INTRQ, then it starts over: read Alternate Status, Status, Error, Sector Count (which is now decremented by one), Alternate Status, fetches data... Until it fetches the last sector, where Sector Count now reads 0 (more on that below). Once the last sector has been read, it reads the Status (0x17) register one last time.

Attachment:
seagate_read_mid.png
seagate_read_mid.png [ 101.82 KiB | Viewed 9432 times ]

Attachment:
seagate_read_end.png
seagate_read_end.png [ 68.56 KiB | Viewed 9432 times ]


When I put a newer hard drive (I tried both a 128MB Transcend CF300 CF and a 40GB Toshiba MK4026GAX 2.5" HDD, both share the same behavior), all sectors are correctly retrieved. However, there is a huge 7.5-second delay between each read attempt (the following capture is hugely zoomed out, each spike represents a lot of register reads/writes).

Attachment:
transcend_7.5s.png
transcend_7.5s.png [ 41.17 KiB | Viewed 9432 times ]


Upon closer inspection, we notice that the last read of the Status (0x17) register (after all data has been read) is missing. It is only performed 7.5 seconds later.

Attachment:
transcend_read_end.png
transcend_read_end.png [ 54.67 KiB | Viewed 9432 times ]


Going deeper, we can observe that the Seagate returns an incorrect Sector Count (0x12). It is decremented by one compared to the newer Transcend and Toshiba drives. For example, when reading one sector, the Toshiba or Transcend drive will return a Sector Count of 0x01 while the Seagate will return 0x00. When fetching 16 sectors, the Toshiba and Transcend report a Sector Count of 0x10 while the Seagate reports 0x0f.

Attachment:
transcend_read_mid.png
transcend_read_mid.png [ 110.26 KiB | Viewed 9432 times ]


I suspect that the Seagate drive doesn't follow the ATA specification correctly and reports the Sector Count register with an off-by-one. As the bootloader was written with the Seagate behavior in mind, with newer drives it will think that a sector remains to be read and wait for an interrupt, only to time out after 7.5 seconds. It proceeds to slowly load the OS which likes it even less and crashes.

Looking into the ATA-1 spec, it says:
Quote:
If this register is zero at command completion, the command was successful.
If not successfully completed, the register contains the number of sectors
which need to be transferred in order to complete the request.


The Transcend datasheet says:
Quote:
Sector Count Register (Address - 1F2h[172h]; Offset 2)
This register contains the numbers of sectors of data requested to be transferred on a read or write
operation between the host and the CompactFlash Storage Card. If the value in this register is zero, a count
of 256 sectors is specified. If the command was successful, this register is zero at command completion. If
not successfully completed, the register contains the number of sectors that need to be transferred in order
to complete the request.


Now my question: is anyone aware of a "quirk" on Seagate devices from 1994 that would explain this? Or could this be explained through the ATA standard in some way? I had no luck with Google. I limited search results to the 90s but nothing mentions this behavior. Maybe an old-timer remembers something like this?


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: August 8th, 2022, 22:18 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 15461
Location: Australia
If I understand correctly, Seagate has interpreted the sector count register to reflect the "number of sectors remaining to be processed, excluding the current one" whereas the other drives interpret it as "number of sectors remaining to be processed, including the current one". If so, then how do the drives handle a block of 256 sectors? Will your device read just 1 sector from the CF card when asked to read 256?

Perhaps there is an explanation or workaround in the Linux source code?

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Replacing a 1994 HDD with a CF on an m68k-based machine
PostPosted: August 9th, 2022, 15:18 
Offline
User avatar

Joined: February 13th, 2022, 15:26
Posts: 5
Location: Switzerland
> If I understand correctly, Seagate has interpreted the sector count register to reflect the "number of sectors remaining to be processed, excluding the current one" ...

You understood correctly, but this only happens when reading Sector Count. Writing to the Sector Count register behaves similarly with all drives.

> If so, then how do the drives handle a block of 256 sectors?

I cannot really answer this question, but it doesn't really matter in the present case. The bootloader reads at most <sector_per_track> at a time. Since the drive has less than 256 sectors per track, the firmware will never issue a 256-sector read. It will issue multiple read commands instead.

To illustrate, consider the following outputs from my qemu emulator. It includes an emulated ATA hard drive with a cylinder/head/sector geometry of my choosing. That way, I can dump all register accesses made by the Tektronix bootloader. For the Seagate, read accesses are made of maximum 17 (0x11) sectors at a time. For the Transcend, maximum 63 (0x3f) sectors at a time.

Running with -device ide-hd,drive=drive0,cyls=980,heads=15,secs=17 (seagate):
Quote:
...
ide_ioport_write: [0x0002] [Sector Count] 0x0011
ide_ioport_write: [0x0003] [Sector Number] 0x0001
ide_ioport_write: [0x0004] [Cylinder Low] 0x0001
ide_ioport_write: [0x0005] [Cylinder High] 0x0000
ide_ioport_write: [0x0006] [Device/Head] 0x00a6
ide_ioport_write: [0x0007] [Command] 0x0020
...
ide_ioport_write: [0x0002] [Sector Count] 0x0011
ide_ioport_write: [0x0003] [Sector Number] 0x0001
ide_ioport_write: [0x0004] [Cylinder Low] 0x0001
ide_ioport_write: [0x0005] [Cylinder High] 0x0000
ide_ioport_write: [0x0006] [Device/Head] 0x00a7
ide_ioport_write: [0x0007] [Command] 0x0020
...
ide_ioport_write: [0x0002] [Sector Count] 0x0011
ide_ioport_write: [0x0003] [Sector Number] 0x0001
ide_ioport_write: [0x0004] [Cylinder Low] 0x0001
ide_ioport_write: [0x0005] [Cylinder High] 0x0000
ide_ioport_write: [0x0006] [Device/Head] 0x00a8
ide_ioport_write: [0x0007] [Command] 0x0020
...


Running with -device ide-hd,drive=drive0,cyls=246,heads=16,secs=63 (transcend):
Quote:
...
ide_ioport_write: [0x0002] [Sector Count] 0x003f
ide_ioport_write: [0x0003] [Sector Number] 0x0001
ide_ioport_write: [0x0004] [Cylinder Low] 0x0000
ide_ioport_write: [0x0005] [Cylinder High] 0x0000
ide_ioport_write: [0x0006] [Device/Head] 0x00a3
ide_ioport_write: [0x0007] [Command] 0x0020
...
ide_ioport_write: [0x0002] [Sector Count] 0x003f
ide_ioport_write: [0x0003] [Sector Number] 0x0001
ide_ioport_write: [0x0004] [Cylinder Low] 0x0000
ide_ioport_write: [0x0005] [Cylinder High] 0x0000
ide_ioport_write: [0x0006] [Device/Head] 0x00a4
ide_ioport_write: [0x0007] [Command] 0x0020
...
ide_ioport_write: [0x0002] [Sector Count] 0x003f
ide_ioport_write: [0x0003] [Sector Number] 0x0001
ide_ioport_write: [0x0004] [Cylinder Low] 0x0000
ide_ioport_write: [0x0005] [Cylinder High] 0x0000
ide_ioport_write: [0x0006] [Device/Head] 0x00a5
ide_ioport_write: [0x0007] [Command] 0x0020
...


> Perhaps there is an explanation or workaround in the Linux source code?

Possibly. The Seagate device reads fine under Linux. Here is what dmesg spits out (max PIO0, CHS addressing):
Quote:
kernel: ata1.00: ATA-0: ST9140AG, 11.11.01, max PIO0
kernel: ata1.00: 249900 sectors, multi 0, CHS 980/15/17
kernel: scsi 0:0:0:0: Direct-Access ATA ST9140AG 1.01 PQ: 0 ANSI: 5
kernel: sd 0:0:0:0: [sda] 249900 512-byte logical blocks: (128 MB/122 MiB)
kernel: sd 0:0:0:0: [sda] Write Protect is off
kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
kernel: sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
kernel: ata5: SATA link down (SStatus 0 SControl 300)
kernel: ata6: SATA link down (SStatus 0 SControl 300)
kernel: random: fast init done
kernel: sd 0:0:0:0: [sda] Attached SCSI disk


I spent a bit of time looking at the Linux code (mainly https://elixir.bootlin.com/linux/latest ... sff.c#L980 ) and I don't think it ever reads the Sector Count register while reading in PIO0. If you think about it, it is kinda pointless: we already know how many bytes we requested. It only checks the Status register (BSY, DRDY, DRQ, ERR) to make sure that everything is running as expected. Interrupts don't seem to even be used with PIO:

Quote:
ATA PIO
ATA_PROT_PIO is in this category. libata currently implements PIO
with polling. ATA_NIEN bit is set to turn off interrupt and
pio_task on ata_wq performs polling and IO.

https://docs.kernel.org/driver-api/liba ... -processed

The documentation does not talk about Sector Count in its exception chapter (below, HSM means Host State Machine):
Quote:
HSM violation
~~~~~~~~~~~~~

This error is indicated when STATUS value doesn't match HSM requirement
during issuing or execution any ATA/ATAPI command.

- ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying to
issue a command.

- !BSY && !DRQ during PIO data transfer.

- DRQ on command completion.

- !BSY && ERR after CDB transfer starts but before the last byte of CDB
is transferred. ATA/ATAPI standard states that "The device shall not
terminate the PACKET command with an error before the last byte of
the command packet has been written" in the error outputs description
of PACKET command and the state diagram doesn't include such
transitions.

In these cases, HSM is violated and not much information regarding the
error can be acquired from STATUS or ERROR register. IOW, this error can
be anything - driver bug, faulty device, controller and/or cable.

As HSM is violated, reset is necessary to restore known state.
Reconfiguring transport for lower speed might be helpful too as
transmission errors sometimes cause this kind of errors.

https://docs.kernel.org/driver-api/liba ... -violation

Still, I stumbled upon this comment which I think shows that I am not crazy (see the WARNING):
Quote:
/* For 10-byte and 16-byte SCSI R/W commands, transfer
* length 0 means transfer 0 block of data.
* However, for ATA R/W commands, sector count 0 means
* 256 or 65536 sectors, not 0 sectors as in SCSI.
*
* WARNING: one or two older ATA drives treat 0 as 0...
*/

https://elixir.bootlin.com/linux/latest ... si.c#L1600

Right now I see 3 solutions:
- Reverse engineer and patch the OS (and potentially the bootloader) to correct that off-by-one. The OS binary may be patched on the HDD, therefore no soldering work required (assuming it doesn't jump to code in ROM to handle INTRQ, but I don't believe so).
- Use an FPGA to intercept the ATA bus and rewrite Sector Count replies by the HDD to correct that off-by-one. I don't want to go down that rabbit hole.
- Give up as I'm not even sure this will fix the issue (but I have faith).


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group