Preservation Home


Conventions


Naming Forms

There are three forms in which we generate titles. Titles can be further customized aside from their form.

Except for the Japanese-friendly form, all titles are based on the disk label, as this is the most consistent method for picking titles (since the label is almost always available).

Original / Unicode

This form attempts to match the title as shown on the label as closely as is practical.

Titles in this form require full Unicode support, as they mix characters that map to different local code pages.

Japanese / Shift-JIS

This form attempts to generate titles that are more Japanese-friendly, typically by substituting yomigana for romaji.

As not all labels include yomigana, we are willing to accept a rendering that does not appear on the label as long as it is from an official source, such as the manual or box. This is one area where there are gaps in the data, as we typically do not have access to all the materials. Please contact us if you have relevant scans.

Titles in this form are Unicode encoded but are restricted to characters in the Shift-JIS (932) character set.

English / ASCII

This is the basic 7-bit ASCII form.

Words written in non-English characters are romanized or translated, depending on their provenance. The titles may also be cleaned up slightly to better represent English conventions by adding spaces or whatever. Internally, the original capitalization is preserved, although the default XML file generated for this form corrects it automatically, since most people don't like ALL CAPS.

Examples

The following are examples of the three different forms. Additional fields and ornamentation are usually added to the final rendition, depending on the options selected, so these are "bare" examples that wouldn't ordinarily be used for file names.

MÄRCHEN MAZE(メルヘンメイズ) A

メルヘンメイズ A

Maerchen Maze A

出たな!!TwinBee(ツインビー) A

出たな!!ツインビー A

Detana!! TwinBee A

Étoile Princesse(エトワールプリンセス) DATA 2

エトワールプリンセス DATA 2

Etoile Princesse Data 2

Note that it is impossible to mix accented characters and katakana without Unicode, as seen in some of the examples, because no "ANSI"/MBCS code page contains both types of characters.

Tags

While our primary goal is to promote the preservation of pristine images, some of the databases we publish (namely Jouyou) may include hashes for non-pristine images, to a limited extent. "Tags" in the following format are added to indicate this.

[f]

This mean that the disk has been modified by someone (not us) such that compatibility has been improved. At most, only one "fixed" image will be hashed per disk, and it will have been checked and found to be the best-available implementation without any known problems. Therefore, you can make use of such images with a high degree of confidence that they will function properly.

[m]

This mean that the disk is in a non-pristine state caused by the software itself. Common examples are saved games and high scores. We don't normally like to include these, but certain games are highly aggressive about modifying themselves and it's possible that pristine versions of certain disks no longer even exist.

[b]

This mean that the disk is known (or strongly suspected) to be in a non-pristine state due to other factors such as improper dumping or spurious modifications.

This tag is also applied in situations where a particular disk doesn't fit properly into XDF format (e.g. missing sectors) or does not have a canonical form such that variations can occur between dumps, even of the same disk. Although in this case the 'b' tag will be dropped - for fixed images only - if there aren't any hypothetical "better" possible images, merely different ones.

In the case of fixed images, the 'b' tag may also be indicative of particularly poor craftsmanship. Suffering from minor, harmless errors, or merely being suboptimal, is not sufficient to merit this tag since it would apply to far too many fixed image hashes.

Since we aren't interested in cataloging all the junk that's available, you won't see very many permutations other than "[f,b]". Furthermore, there is typically no good reason for us to include more than one non-pristine variant, so certain familiar formulations used to differentiate files with similar tags, like "[f2]", don't apply.

Variants

Sadly, we've found that legitimate variant versions of disks are surprisingly common, at least on the X68000. This makes preservation much more difficult, as we need pristine dumps of every variant. Now consider that some of these disks were shipped in a writable state and immediately write to themselves once they are booted.

Differentiating variants is done by adding what we call a "descriptor" to the filename. The descriptor is simply a string enclosed in parentheses.

Descriptors are chosen based on some arbitrary differentiating characteristic. There is no consistent way to interpret such strings, however it is guaranteed that the resulting filenames will sort in order of age, to the extent we are able to determine this, and to the extent such a metric makes sense.

In other words, when using an ascending sorting algorithm, the "best" version will appear later. However, in some cases the variants are all basically the same; some games were published in several variants with minor differences solely to make cracking more difficult. (The most extreme case of this we've seen has at least six variants that we've identified so far.) In such cases there is no "best" or "newest" version, although using the one sorted lower is harmless.

Finally, be aware that although we identify and track variants, we normally only apply descriptors when it is required to differentiate hashed files that are getting published. In other words, you won't see descriptors until we have hashes for two or more variants of the same disk that all meet our release criteria. This is done in case another variant appears in the meantime, which could require that the descriptor strings be reformulated. (In our limited experience, discovering two or three variants is not an extraordinary occurrence.)

Confidence Scores

Whether or not the hash for a particular disk image qualifies to be "released" depends on an internal number called its confidence score.

Disk dumps are analyzed and a number of factors are considered when calculating the confidence score. If it meets the threshold, then it may be included when generating the files we publish. (Other constraints may apply as well, depending on the subproject.)

We are pretty conservative about this, due to the difficulties inherent in a writable medium.

As an aside, a similar concept is also used in the analysis of "fixed" images (as seen in the Jouyou set) although we don't track it numerically. It is largely based on the level of testing that was done.

Preservation Home