To celebrate the 100th anniversary of magnetic recording, IBM announced, in 1998, the world’s highest capacity hard drive for desktop PCs. This PC came with a new breakthrough technology called Giant Magnetoresistive (GMR) heads, which enabled the further miniaturization of disk drives. The 2007 Nobel Prize in Physics was awarded to Albert Fert and Peter Grünberg for the discovery of the GMR effect in 1988.
There is little known fact that the GMR effect can be successfully used to extract digital data from a hard drive as part of a cybersecurity forensic analysis (the data written on a modern high-density hard disk drive can be recovered via magnetic force microscopy of the disks’ surface. To this end, a variety of image processing techniques are utilized to process the raw images into a readily usable form, and subsequently, a simulated read channel is designed to produce an estimate of the raw data corresponding to the magnetization pattern written on the disk.
Hard disk drives (HDDs) have had a remarkable history of growth and development, starting with the IBM 350 disk storage unit in 1956, which had a capacity of 3.75MB and weighed over a ton, to the latest 4TB 3.5 inch form factor drive as of 2011. Clearly, the technology underlying hard drives has changed dramatically in this time frame, and is expected to continue on this path. Despite all the change, the basic concept of storing data as a magnetization pattern on a physical medium which can be retrieved later by using a device that responds as it flies over the pattern is still the principal idea used today.
The main components of a modern hard drive are the platters, head stack, and actuator, which are responsible for physically implementing the storage and retrieval of data. Data are stored as a magnetization pattern on a given side of a platter, many of which are typically in a given drive. The data is written onto and read from the platter via a head, and each platter requires two heads, one to read from each side. At this basic level, the disk drive appears to be a fairly simple device, but the details of what magnetization pattern to use to represent the data, and how to accurately, reliably, and quickly read and write from the drive have been the fruits of countless engineers’ labor over the past couple of decades.
Early hard drives used a method called longitudinal recording, where a sequence of bits is represented by magnetizing a set of grains in one direction or the other, parallel to the recording surface. By 2005, to allow the continuing push for increasing recording density, a method called perpendicular recording began to be used in commercially available drives. As the name suggests, the data is represented by sets of grains magnetized perpendicular to the recording surface. To allow this form of recording, the recording surface itself has to be designed with a soft under layer that permits a monopole writing element to magnetize the grains in the top layer in the desired manner. As the push for density is still increasing, newer technologies are being considered, such as shingled magnetic recording (SMR) and bit patterned recording (BPMR), which both take different approaches to representing data using higher density magnetization patterns.
Once data are written to the drive, the retrieval of the data requires sensing the magnetic pattern written on the drive, and the read head is responsible for the preliminary task of transducing the magnetization pattern into an electrical signal which can be further processed to recover the written data. Along with the changes in recording methods, the read heads necessarily underwent technological changes as well, from traditional ferrite wire-coil heads to the newer magneto-resistive (MR), giant magneto-resistive (GMR), and tunneling magneto-resistive (TMR) heads. All of these, when paired with additional circuitry, produce a voltage signal in response to flying over the magnetization pattern written on the disk, called the readback or playback signal. It is this signal that contains the user data, but in a highly encoded, distorted, and noisy form, and from which the rest of the system must ultimately estimate the recorded data, hopefully with an extremely low chance of making an error.
At the system level abstraction, the hard disk drive, or any storage device for that matter, can be interpreted as a digital communication system, and in particular one that communicates messages from one point in time to another (unfortunately, only forwards), rather than from one point in space to another. Specifically, the preprocessor of user data which includes several layers of encoders and the write head which transduces this encoded data into a magnetization pattern on the platter compose the transmitter. The response of the read head to the magnetization pattern and the thermal noise resulting from the electronics are modeled as the channel. Finally, the receiver is composed of the blocks necessary to first detect the encoded data, and then to decode this data to return the user data. This level of abstraction makes readily available the theories of communication systems, information, and coding for the design of disk drive systems, and has played a key role in allowing the ever increasing densities while still ensuring the user data is preserved as accurately as possible.
Hard disk drives are nowadays commonplace devices used to provide bulk storage of the ever increasing amounts of data being produced every moment. Magnetic force microscopy is one form of the class of modern microscopy called scanning probe microscopy, which images the minute details of magnetic field intensities on the surface of a sample.
Using this forensic process to obtain the raw data written to disk, first the relationship between the images acquired through magnetic force microscopy and the signals involved in the read channel that is employed by the hard disk drives themselves to read the data written on the disks must be determined. Once this has been done, a system can be developed that takes advantage of this relationship and uses the design principles involved in read channel engineering.
Providing the service of data recovery from hard disk drives that are unable to be read in a normal fashion composes a significant industry in which many companies are involved. Generally, these services fall into two camps – those for personal data recovery and those for forensic data recovery, where the goal is to recover legal evidence which might have been intentionally deleted by the perpetrator. Personal recovery services exist because hard drives, like any other complex system, have some probability of failing due to one reason or another, and when there are millions of drives being used in a variety of conditions, some inevitably fail. The modes of failure (or intentional destruction) are varied, and can allow recovery with as simple a procedure as swapping its printed circuit board (PCB) with that of another drive of a similar model, or render any form of recovery completely intractable.
The most common and successful methods of data recovery from a failed drive are to replace selected hardware from the drive that has failed with the same part from a drive of the same model. Note that all of these methods assume that the data recorded on the magnetic surfaces of the disks is completely intact. Examples include replacing the PCB, re-flashing the firmware, replacing the headstack, and moving the disks to another drive. The latter two need to be performed in a clean-room environment, since it is required that the disks be free of even microscopic particles, since the flying heights of the heads are usually on the order of nanometers! As the bit density of hard disk drives is continually increasing, each drive is “hyper-tuned” at the factory, where a myriad of parameters are optimized for the particular head and media characteristics of each individual drive. This decreases the effectiveness of part replacement techniques when a particular drive fails, as these optimized parameters might vary significantly even among drives of the same model and batch.
A more general approach to hard drive data recovery involves using a spin-stand. On this device, the individual disks of the platter are mounted, a giant magnetoresistive head (GMR) is flown above the surface, and the response signal is captured. This allows rapid imaging of large portions of the disk’s surface, and the resulting images have to then be processed to recover the data written on the disk. Specifically, the readback signal produced by the GMR head as the disk is spun and the head is moved across the diameter of the disk is composed into a contiguous rectangular image, covering the entire drive. This is then processed to remove intersymbol interference (ISI), and from this the data corresponding to the actual magnetization pattern is detected, and further processed to ultimately yield user data. Some of the main challenges of this approach are encountered first in the data acquisition stage, where the absence of perfect centering of the disk on the spin-stand yields a sinusoidal distortion of the tracks when imaged. This can be combated using proper centering, or track following, where the head position is continuously adjusted to permit the accurate imaging of the disk. The precoding in the detected data is inverted, the ECC and RLL coding is decoded, and descrambling is then performed to give the decoded user data. Finally, user files are reconstructed from the user data in different sectors, based on the knowledge of the file systems that are used in the drive. This process has been demonstrated to be effective in recovering with high accuracy a user JPEG image that was written to a 3 GB commercial hard drive from 1997.
Compared to MFM, the spin-stand approach clearly is better for recovering significant amounts of data, as it allows rapid imaging of the entire surface of a disk. However, it is obvious that the data can only be recovered from a disk in spinnable condition. For example, if the disk is bent (even very slightly) or if only a fragment of the disk is available, this would preclude the use of the spin-stand method. Using MFM to image the surface of the disk would still be possible, even in these extreme situations. Once MFM images are acquired, they must still be processed in a similar manner to that described above, but the nature of MFM imaging provides some different challenges.
At the other end of the spectrum is data sanitization, where the goal is to prevent the recovery of confidential information stored on a hard drive by any means. This is of primary importance to government agencies, but also to private companies that are responsible for keeping their customers confidential information secure. It should be of significant concern to personal users as well, since when a user decides to replace a hard drive, not properly sanitizing it could result in personal information, for example medical or financial records, being stolen. Before a hard drive containing sensitive information is disposed of, it is necessary to clear this information to prevent others from acquiring it. Performing simple operating system level file deletion does not actually remove the data from the drive, but instead merely deletes the pointers to the files from the file system. This allows the retrieval of these “deleted” files with relative ease through the operating system itself with a variety of available software. A more effective level of cleansing is to actually overwrite the portions of the disk that contained the user’s files, once or perhaps multiple times. Yet more effective is to use a degausser, which employs strong magnetic fields to randomize the magnetization of the grains on the magnetic medium of each disk. Most effective is physical destruction of the hard drive, for example by disintegrating, pulverizing or melting. Generally, the more effective the sanitization method, the more costly in both time and money it is. Hence, in some situations, it is desired to utilize the least expensive method that guarantees that recovery is infeasible.
As mentioned, since operating system “delete” commands only remove file header information from the file system as opposed to erasing the data from the disk, manually overwriting is a more effective sanitization procedure. The details of what pattern to overwrite with, and how many times, is somewhat a contentious topic. Various procedures are (or were) described by various organizations and individuals, ranging from overwriting once with all zeros, to overwriting 35 times with several rounds of random data followed by a slew of specific patterns. More important is the fact that several blocks on the drive might not be logically accessible through the operating system interface if they have been flagged as defective after the drive has been in use for some time. In modern disk drives, this process of removing tracks from the logical address space that have been deemed defective, known as defect mapping, is continuously performed while tine drive is in operation. To resolve this issue, an addition to the advanced technology attachment (ATA) protocol called Secure Erase was developed by researchers at CMRR. This protocol overwrites every possible user data record, including those that might have been mapped out after the drive was used for some period. While overwriting is significantly more secure than just deleting, it is still theoretically possible to recover original data using microscopy or spin-stand techniques. One reason for this is that when tracks are overwritten, it is unlikely that the head will traverse the exact same path that it did the previous time the data was written, and hence some of the original data could be left behind in the guardbands between tracks. However, with modern high density drives, the guardbands are usually very small compared to the tracks (or non-existent in the case of shingled magnetic recording) making this ever more difficult. Finally, it should be noted that the drive is left in usable condition with the overwriting method.
The next level of sanitization is degaussing, which utilizes an apparatus known as a degausser to randomize the polarity of magnetic grains on the magnetic media of the hard drive. There are three main types of degaussers: coil, capacitive, and permanent magnet. The first two utilize electromagnets to produce either a continuously strong, rapidly varying magnetic field or an instantaneous but extremely strong magnetic field pulse to randomly set the magnetization of individual domains in a hard drive’s media. The last utilizes a permanent magnet that can produce a very strong magnetic field, depending on the size of the magnet, but produces a constant field, that is not time-varying. Depending on the coercivity of the magnetic medium used in the drive, different levels of magnetic fields may be necessary to fully degauss a drive. If an insufficient field strength is used, some remanant magnetic field may be present on the disks, which can be observed using MFM for example. One important difference between degaussing and overwriting is that the drive is rendered unusable after degaussing, since all servo regions are also erased. In fact, if the fields used in degaussing are strong enough, the permanent magnets in the drive’s motors might be demagnetized, clearly destroying the drive. The most effective sanitization method is of course physical destruction, but degaussing comes close, and often is performed before additional physical destruction for drives containing highly confidential information.
The first step in the forensic recovery process is scanning probe microscopy. Modern microscopy has three main branches: optical or light, electron, and scanning probe. Magnetic force microscopy (MFM) is one example of scanning probe microscopy, where a probe that is sensitive to magnetic fields is used. Scanning probe microscopy utilizes a physical probe that interacts with the surface of the drive as it is scanned, and it is this interaction which is in turn measured while moving the probe in a raster scan. This results in a twodimensional grid of data, which can be visualized on a computer as a gray-scale or false color image. The choice of the probe determines the features of the sample the probe interacts with, for example magnetic forces (in MFM). The characteristics of the probe also determine the resolution, and specifically the size of the apex of the probe is approximately the resolution limit. Hence, for atomic scale resolution, the probe tip must terminate with a single atom! Another important mechanism is to be able to move the probe in a precise and accurate manner on the scale of nanometers. Piezoelectric actuators are typically employed, which respond very precisely to changes in voltage and thus are used to move the probe across the surface in a precisely controlled manner.
The final step in this forensic process is digital imaging. Digital Imaging is a ubiquitous technology these days and has achieved the state of being both the cheapest form of capturing images as well as providing high quality images that can be readily manipulated in software to accomplish a wide array of tasks. In some cases, such as in electron and scanning probe microscopy, the resulting images can only be represented digitally–there is no optical image to begin which that can be captured on film. The key motivation of digital imaging however, is digital image processing. Once an image is represented digitally, it can be processed just like any other set of data on a computer. This permits an endless number of ways to alter images for a variety of different purposes. Some of the basic types of image processing tasks include enhancement, where the image is made more visually appealing or usable for a particular purpose, segmentation, where an image is separated into component parts, and stitching, where a set of several images related to each other are combined into a composite. At a higher level, these basic tasks can be used as preprocessing steps to perform image classification, where the class of the object being imaged is determined, or pattern recognition, whereby patterns in sets of images are sought that can be used to group subsets together. Fundamental to all of these is first the representation of some aspect of the world as a set of numbers–how these numbers are interpreted determines what the image looks like, and also what meaning can be construed from them.
The system to reconstruct data on a hard disk via MFM imaging is broadly characterized by three steps. The first is to actually acquire the MFM images on a portion of the disk of interest, and to collect them in a manner that readily admits stitching and other future processing. Next is to perform the necessary image processing steps (preprocessing, aligning, stitching, and segmenting) to compose all the images acquired into a single image and separate the different tracks. Finally, a readback signal is acquired from each track and this is then passed through a PRML channel to give an estimate of the raw data corresponding to the magnetization pattern written on the drive. If the user data is to be recovered, additional decoding and de-scrambling is required, and the specific ECC, RLL, and scrambling parameters of the disk drive must be acquired to perform this.
This whole process sound fairly complicated, but it has been demonstrated that it can be used, and in some extreme cases, it has been used to recover sensitive data from hard drives. So, dispose of your hard drive accordingly using a readily available data sanitization technique.