The Floppy-Disk Archiving Machine, Mark III
"I'm not building a Mark III."
Famous last words.
I made the mistake of asking my parents if they had any 3.5" floppy disks at their place.
They did.
And a couple hundred of them were even mine.
Faced with the prospect of processing another 500-odd disks, I realized the Mark III was worth doing. So I made a few enhancements for the Floppy Machine Mark III:
- Changed the gearing of the track motor assembly to increase torque and added plates to keep its structure from spreading apart. The latter had been causing the push rod mechanism to bind up and block the motor, even at 100% power.
- Removed the 1x4 technic bricks from the end of the tractor tread, and lengthened the tread by several links and added to the top of the structure under those links. This reduced the frequency that something got caught on the structure and caused a problem.
- Extended the drive's shell's lower half by replacing the 1x6 technic bricks with 1x10 technic bricks; and a 1x4 plate on the underside flush with the end. This made the machine more resilient to the drive getting dropped too quickly.
- Added 1x2 bricks to lock the axles into place for the drive shell's pivot point, since they seemed to be working their way out very slowly.
- Added 1x16 technic bricks to the bottom of all the legs, and panels to accommodate that, increasing the machine's height by 5" and making it easier to pull disks out of the GOOD and BAD bins.
- Added doors at the bottom of the trays in the front to keep disks from bouncing out
- Added back wall at bottom of the trays in the back to keep disks from bouncing out.
- Moved the ultrasonic sensor lower in an attempt to reduce the false empty magazine scenario. This particular issue was sporadic enough that the effectiveness of the change is hard to determine. I only had one false-empty magazine event after this change.
- Added a touch sensor to detect when the push rod has been fully retracted in order to protect the motor. Before this, the machine identified the position of the push rod by driving the push rod to the extreme right until the motor blocked. This seems to have had a negative effect upon the motor in question. Turning the rotor of that poor, abused motor in one direction has a very rough feel. This also used the last sensor port on the NXT. (One ultrasonic sensor and three touch sensors.)
- Replaced the cable to the push rod motor with a longer one from HiTechnic.
- Significantly modified the controlling software to calibrate locations of the motors in ways that did not require driving a motor to a blocked state.
- Enhanced the controlling software to allow choosing what events warranted marking a disk as bad and which didn't.
- Enhanced the data recovery software to allow bailing on the first error detected. This helps when you want to do an initial pass through the disks to get all the good disks archived first. Then you can run the disks through a second time, spending more time recovering the data off the disks.
- Enhanced the controlling software to detect common physical complications and take action to correct it, such as making additional attempts to eject a disk.
With those changes, the Mark III wound up much more rainbow-warrior than the Mark II:
And naturally, I updated the model with the changes:
The general theme for the Mark II was to rebuild the machine with a cleaner construction, reasonable colors, and reduced part count. The general theme for the Mark III was to improve the reliability of the machine so it could process more disks with less baby-sitting.
All told, I had 1196 floppy disks. If you stack them carefully, they'll fit in a pair of bankers boxes.
And with that, I'm done. No Mark IV. For real, this time. I hope.
Previously: the Mark II
The Floppy-Disk Archiving Machine, Mark II
Four and a half years ago, I built a machine to archive 3.5" floppy disks. By the time I finished doing the archiving of the 443 floppies, I realized that it fell short of what I wanted. There were a couple of problems:
- many 3.5" floppy disk labels wrap around to the back of the disk
- disks were dumped into a single bin
- the machine was sensitive to any shifts to the platform, which consisted of two cardboard boxes
- the structure of the frame was cobbled together and did not use parts efficiently
- lighting was ad-hoc and significantly affected by the room's ambient light
- the index of the disks was cumbersome
I recently had an opportunity to dust off the old machine (quite literally), and do a complete rebuild of it. That allowed me to address the above issues. Thus, I present:
The Floppy-Disk Archiving Machine, Mark II
The Mark II addresses the shortcomings of the first machine.
Under the photography stage, an angled mirror provides the camera (an Android Dev Phone 1) a view of the label on the back of the disk. That image needs perspective correction, and has to be mirrored and cropped to extract a useful image of the rear label. OpenCV serves this purpose well enough, and is straight forward to use with the Python bindings.
The addition of lights and tracing-paper diffusers improved the quality of the photos and reduced the glare. It also made the machine usable whether the room lights were on or off.
The baffle under disk drive allows the machine to divert the ejected disks into either of two bins. I labeled those bins "BAD" and "GOOD". I wrote the control software (also Python) to accept a number of options to allow sorting the disks by different criteria. For instance, sometimes OpenCV's object matching selects a portion of a disk or its label instead of the photography stage's arrows. When that happens, the extraction of the label will fail. That can happen for either the front or back disk labels. The machine can treat such a disk as 'BAD'. When a disk is processed, and bad bytes are found, the machine can treat the disk as bad. The data extraction tool supports different levels of effort for extracting data from around bad bytes on a disk.
This allows for a multiple-pass approach to processing a large number of disks.
In the first pass, if there is a problem with either picture, or if there are bad bytes detected, sort the disk as bad. That first pass can configure the data extraction to not try very hard to get the data, and thus not spend much time per disk. At the end of the first pass, all the 'GOOD' disks have been successfully read with no bad bytes, and labels successfully extracted. The 'BAD' disks however, may have failed for a mix of different reasons.
The second pass can then expend more effort extracting data from disks with read errors. Disks which encounter problems with the label pictures would still be sorted as 'BAD', but disks with bad bytes would be sorted as 'GOOD' since we've extracted all the data we can from them, and we have good pictures of them.
That leaves us with disks that have failed label extraction at least once, and probably twice. At this point, it makes sense to run the disks through the machine and treat them as 'GOOD' unconditionally. Then the label extraction tool can be manually tweaked to extract the labels from this small stack of disks.
Once the disks have been successfully photographed and all available data extracted, an html-based index can be created. That process creates one page containing thumbnails of the front of the disks.
Each thumbnail links to a page for a disk giving ready access to:
- a full-resolution picture of the extracted front label
- a full-resolution picture of the extracted back label
- a zip file containing the files from the disk
- a browsable file tree of the files from the disk
- an image of the data on the disk
- a log of the data extracted from the disk
- the un-processed picture of the front of the disk
- the un-processed picture of the back of the disk
The data image of the disk can be mounted for access to the original filesystem, or forensic analysis tools can be used on it to extract deleted files or do deeper analysis of data affected by read errors. The log of the data extracted includes information describing which bytes were read successfully, which had errors, and which were not directly attempted. The latter may occur due to time limits placed on the data extraction process. Since a single bad byte may take ~4 seconds to return from the read operation, and there may be 1474560 bytes on a disk, if every byte were bad you could spend 10 weeks on a single disk, and recover nothing. The data recovery software (also written in Python) therefore prioritizes the sections of the disk that are most likely to contain the most good data. This means that in practice everything that can be read off the disk will be read off in less than 20 minutes. For a thorough run, I will generally configure the data extraction software to give up if it has not successfully read any data in the past 30 minutes (it's only machine time, after all). At that point, the odds of any more bytes being readable are quite low.
So what does the machine look like in action?
(Also posted to YouTube.)
Part of the reason I didn't disassemble the machine while it collected dust for 4.5 years was that I knew I would not be able to reproduce it should I have need of it again in the future. Doing a full rebuild of the machine allowed me to simplify the build dramatically. That made it feasible to create an Ldraw model of it using LeoCAD.
Rebuilding the frame with an eye to modeling it in the computer yielded a significantly simpler support mechanism, and one that proved to be more rigid as well. To address the variations of different platforms and tables, I screwed a pair of 1x2 boards together with some 5" sections of 1x4 using a pocket hole jig. The nice thing about the 5" gap between the 1x2 boards is that the Lego bricks are 5/16" wide, so 16 studs fit neatly within that gap. The vertical legs actually extend slightly below the top of the 1x2's, and the bottom horizontal frame rests on top of the boards. This keeps the machine from sliding around on the wooden frame, and makes for a consistent, sturdy platform which improves the machine's reliability.
The increase in stability and decrease in parts required also allowed me to increase the height of the machine itself to accommodate the inclusion of the disk baffle and egress bins.
What about a Mark III?
Uhm, no.
I have processed all 590 disks in my possession (where did the additional 150 come from?), and will be having these disks shredded. That said, the Mark II is not a flawlessly perfect machine. Were I to build a third machine, increasing the height a bit further to make the disk bins more easily accessible would be a worthwhile improvement. Likewise, the disk magazine feeding the machine is a little awkward to load with the cables crossing over it, and could use some improvement so that the weight of a tall stack of disks does not impede the proper function of the pushrod.
So, no, I'm not building a Mark III. Unless you or someone you know happen to have a thousand 3.5" floppy disks you need archived, and are willing to pay me well to do it. But who still has important 3.5" floppy disks lying around these days? I sure don't. (Well, not anymore, anyway.)
Previously: the Mark I
Update: the Mark III