There are three forms in which we generate titles. Titles can be further customized aside from their form.
Except for the Japanese-friendly form, all titles are based on the disk label, as this is the most consistent method for picking titles (since the label is almost always available).
This form attempts to match the title as shown on the label as closely as is practical.
Titles in this form require full Unicode support, as they mix characters that map to different local code pages.
This form attempts to generate titles that are more Japanese-friendly, typically by substituting yomigana for romaji.
As not all labels include yomigana, we are willing to accept a rendering that does not appear on the label as long as it is from an official source, such as the manual or box. This is one area where there are gaps in the data, as we typically do not have access to all the materials. Please contact us if you have relevant scans.
Titles in this form are Unicode encoded but are restricted to characters in the Shift-JIS (932) character set.
This is the basic 7-bit ASCII form.
Words written in non-English characters are romanized or translated, depending on their provenance. The titles may also be cleaned up slightly to better represent English conventions by adding spaces or whatever. Internally the original capitalization is preserved, although the default XML file generated for this form corrects it automatically, since most people don't like ALL CAPS.
This is not a true form but a special case of the Original form. Wherever possible, the official reading - if known - will be appended to the base name, even when it does not appear on the label.
The off-label yomigana are derived the same way as those used for the Japanese form.
The following are examples of the three different forms. Additional fields and ornamentation are usually added to the final rendition, depending on the options selected, so these are "bare" examples that wouldn't ordinarily be used for file names.
MÄRCHEN MAZE（メルヘンメイズ） A
Maerchen Maze A
Detana!! TwinBee A
Étoile Princesse（エトワールプリンセス） DATA 2
エトワールプリンセス DATA 2
Etoile Princesse Data 2
Note that it is impossible to mix accented characters and katakana without Unicode, as seen in some of the examples, because no "ANSI"/MBCS code page contains both types of characters.
While our primary goal is to promote the preservation of pristine images, some of the databases we publish (namely Jouyou) may include hashes for non-pristine images, to a limited extent. "Tags" in the following format are added to indicate this.
Since we aren't interested in cataloging all the junk that's available, you won't see very many permutations other than "[f,b]". Furthermore, there is typically no good reason for us to include more than one non-pristine variant, so certain familiar formulations used to differentiate files with the same tag, like "[f2]", don't apply.
This mean that the disk has been modified by someone (not us) such that compatibility has been improved. At most, only one "fixed" image will be hashed per disk, and it will have been checked and found to be the best-available implementation without any known problems. Therefore, you can make use of such images with a high degree of confidence that they will function properly.
This mean that the disk is in a non-pristine state caused by the software itself. Common examples are saved games and high scores. We don't normally like to include these, but certain games are highly aggressive about modifying themselves and it's possible that pristine versions of certain disks no longer even exist.
This mean that the disk is known (or strongly suspected) to be in a non-pristine state due to other factors such as user modifications or improper dumping.
This tag is also applied in situations where a particular disk doesn't fit properly into XDF format (e.g. missing sectors) or does not have a canonical form, meaning that variations can occur between dumps, even of the same disk. Although in this case the 'b' tag will be dropped - for fixed images only - if there aren't any hypothetical "better" possible images, merely different ones.
In the case of fixed images, the 'b' tag may also be indicative of particularly poor craftsmanship. Suffering from minor, harmless errors, or merely being suboptimal, is not sufficient to merit this tag since it would apply to far too many fixed image hashes.
Note that this tag does not indicate that the functional data on the disk is defective; such images would never be included in the first place!
Some software will reject a disk if it is in an unexpected write-protection state. The "[W]" flag is used to indicate disk images that are known to be unusable unless they are writable. At least one X68000 emulator will check for such flags in the filename and react accordingly.
Aside from user disks, we normally keep disk images marked read-only at the file system level to prevent them from getting modified. Hence we don't typically identify situations where a particular disk is required to be read-only. Most commercially produced floppies we've seen are either unnotched or sealed, and it is good practice to keep images read-only whenever possible.
We do not normally hash user disks, since their intended purpose requires them to be mutable. We make an exception for user disks that have a small number of possible forms after creation.
In order for a user disk to be included in our lists, the process for creating it must meet the following requirements:
This requirement ensures that any disk can be used for creating the user disk; all pre-existing data will be wiped out.
Some games, Akumajou Dracula being a good example, do not overwrite the entire disk, meaning that there are an unlimited number of possible variations depending on what was on the disk originally.
An example of a game that fails this requirement is Death Bringer, which mandates that the protagonist be given a name.
Ys III, on the other hand, only allows for limited customization: one of three possible difficulty levels must be chosen. It is therefore reasonable to hash all three variations.
A common source of unsuitable user disks is games that change files immediately after creating the disk, which usually updates the modification date and time. This has the effect of making every user disk unique (barring unlikely coincidences).
Sadly, we've found that legitimate variant versions of disks are surprisingly common, at least on the X68000. This makes preservation much more difficult, as we need pristine dumps of every variant. Now consider that some of these disks were shipped in a writable state and immediately write to themselves once they are booted.
Differentiating variants is usually done by adding what we call a "descriptor" to the filename. The descriptor is simply a string enclosed in parentheses appearing near the end of the filename. Descriptors are only used in multi-disk games if the various disks can be safely mixed and matched; otherwise a secondary ID (which appears prior to the disk ID) will be used, which keeps compatible disks grouped together.
Descriptors are chosen based on some arbitrary differentiating characteristic. There is no consistent way to interpret such strings; however in most cases the resulting filenames will sort in order of age, to the extent we are able to determine this, and to the extent such a metric makes sense. Purely descriptive strings (e.g. "virus in master") will not necessarily sort in a particular order.
In other words, when using an ascending sorting algorithm, the "best" version will normally appear later. However, in some cases the variants are all basically the same; some games were published in several variations with minor differences solely to make cracking more difficult. (The most extreme case of this we've seen has at least seven variants that we've identified so far!) In such cases there is no "best" or "newest" version, although using the one sorted lower is harmless.
Finally, be aware that although we identify and track variants, we normally only apply descriptors when it is required to differentiate hashed files that are getting published. In other words, you won't see descriptors until we have hashes for two or more variants of the same disk that all meet our release criteria. This is done in case another variant appears in the meantime, which could require that the descriptor strings be reformulated. (In our limited experience, discovering two or three variants is not an extraordinary occurrence.)
Whether or not the hash for a particular disk image qualifies to be published depends on an internal number called its confidence score.
Disk dumps are analyzed and a number of factors are considered when calculating the confidence score. If it meets the threshold, then it may be included when generating the files we publish. (Other constraints may apply as well, depending on the subproject, and contributors can request that hashes of their dumps not be published.)
We are pretty conservative about this, due to the difficulties inherent in a writable medium. In the future, we may decide to lower the threshold slightly so that more hashes will be published, which increases the risk of non-pristine images slipping through.
As an aside, a similar concept is also used in the analysis of "fixed" images (as seen in the Jouyou set) although we don't track it numerically. It is largely based on the level of testing that was done.