There are three forms in which we generate titles. Titles can be further customized aside from their form.
Except for the Japanese-friendly form, all titles are based on the disk label, as this is the most consistent method for picking titles (since the label is almost always available).
This form attempts to match the title as shown on the label as closely as is practical.
Titles in this form require full Unicode support, as they mix characters that map to different local code pages.
This form attempts to generate titles that are more Japanese-friendly, typically by substituting yomigana for romaji.
As not all labels include yomigana, we are willing to accept a rendering that does not appear on the label as long as it is from an official source, such as the manual or box. This is one area where there are gaps in the data, as we typically do not have access to all the materials. Please contact us if you have relevant scans.
Titles in this form are Unicode encoded but are restricted to characters in the Shift-JIS (932) character set.
This is the basic 7-bit ASCII form.
Words written in non-English characters are romanized or translated, depending on their provenance. The titles may also be cleaned up slightly to better represent English conventions by adding spaces or whatever. Internally, the original capitalization is preserved, although the default XML file generated for this form corrects it automatically, since most people don't like ALL CAPS.
The following are examples of the three different forms. Additional fields and ornamentation are usually added to the final rendition, depending on the options selected, so these are "bare" examples that wouldn't ordinarily be used for file names.
MÄRCHEN MAZE（メルヘンメイズ） A
Maerchen Maze A
Detana!! TwinBee A
Étoile Princesse（エトワールプリンセス） DATA 2
エトワールプリンセス DATA 2
Etoile Princesse Data 2
Note that it is impossible to mix accented characters and katakana without Unicode, as seen in some of the examples, because no "ANSI"/MBCS code page contains both types of characters.
While our primary goal is to promote the preservation of pristine images, some of the databases we publish (namely Jouyou) may include hashes for non-pristine images, to a limited extent. "Tags" in the following format are added to indicate this.
This mean that the disk has been modified by someone (not us) such that compatibility has been improved. At most, only one "fixed" image will be hashed per disk, and it will have been checked and found to be the best-available implementation without any known problems. Therefore, you can make use of such images with a high degree of confidence that they will function properly.
This mean that the disk is in a non-pristine state caused by the software itself. Common examples are saved games and high scores. We don't normally like to include these, but certain games are highly aggressive about modifying themselves and it's possible that pristine versions of certain disks no longer even exist.
This mean that the disk is known (or strongly suspected) to be in a non-pristine state due to other factors such as improper dumping or spurious modifications.
This tag is also applied in situations where a particular disk doesn't fit properly into XDF format (e.g. missing sectors) or does not have a canonical form such that variations can occur between dumps, even of the same disk. Although in this case the 'b' tag will be dropped - for fixed images only - if there aren't any hypothetical "better" possible images, merely different ones.
In the case of fixed images, the 'b' tag may also be indicative of particularly poor craftsmanship. Suffering from minor, harmless errors, or merely being suboptimal, is not sufficient to merit this tag since it would apply to far too many fixed image hashes.
Since we aren't interested in cataloging all the junk that's available, you won't see very many permutations other than "[f,b]".
Furthermore, there is typically no good reason for us to include more than one non-pristine variant, so certain familiar formulations used to differentiate files with similar tags, like "[f2]", don't apply.
Some software will reject a disk if it is in an unexpected write-protection state. The "[W]" flag is used to indicate disk images that are known to be unusable unless they are writable. At least one X68000 emulator will check for such flags in the filename and react accordingly.
Aside from user disks, we normally keep disk images marked read-only at the file system level to prevent them from getting modified. Hence we don't typically identify situations where a particular disk is required to be read-only. Most commercially produced floppies we've seen are either unnotched or sealed, and it is good practice to keep images read-only whenever possible.
As for why such checks are performed, it could be an easily subverted form of copy protection, and that probably is the intent in at least some cases, but in our experience it seems more likely to just be an easy way to reject an obviously incorrect disk. For example, some games will reject a disk for being read-only even if there is no intention to write to it at the time it is being inserted.
On the X68000 at least, polling for an inserted disk produces several other status bits, one of which is the write-protection state. So it is easy enough to check for writability along with waiting for a disk to be inserted.
An example where it doesn't seem intended for copy protection is Mid-Garts, which doesn't care about the write-protection state of Disk II (main program) unless you boot Disk I (only used for the intro).
A more sophisticated copy protection method involving writable disks would entail actually writing to the disk, such that the damage is not noticed until later.
Sadly, we've found that legitimate variant versions of disks are surprisingly common, at least on the X68000. This makes preservation much more difficult, as we need pristine dumps of every variant. Now consider that some of these disks were shipped in a writable state and immediately write to themselves once they are booted.
Differentiating variants is usually done by adding what we call a "descriptor" to the filename. The descriptor is simply a string enclosed in parentheses appearing near the end of the filename. Descriptors are only used in multi-disk games if the various disks can be safely mixed and matched; otherwise a secondary ID (which appears prior to the disk ID) will be used, which keeps compatible disks grouped together.
Descriptors are chosen based on some arbitrary differentiating characteristic. There is no consistent way to interpret such strings; however in most cases the resulting filenames will sort in order of age, to the extent we are able to determine this, and to the extent such a metric makes sense. Purely descriptive strings (e.g. "virus in master") will not necessarily sort in a particular order.
In other words, when using an ascending sorting algorithm, the "best" version will normally appear later. However, in some cases the variants are all basically the same; some games were published in several variations with minor differences solely to make cracking more difficult. (The most extreme case of this we've seen has at least seven variants that we've identified so far.) In such cases there is no "best" or "newest" version, although using the one sorted lower is harmless.
Finally, be aware that although we identify and track variants, we normally only apply descriptors when it is required to differentiate hashed files that are getting published. In other words, you won't see descriptors until we have hashes for two or more variants of the same disk that all meet our release criteria. This is done in case another variant appears in the meantime, which could require that the descriptor strings be reformulated. (In our limited experience, discovering two or three variants is not an extraordinary occurrence.)
Whether or not the hash for a particular disk image qualifies to be "released" depends on an internal number called its confidence score.
Disk dumps are analyzed and a number of factors are considered when calculating the confidence score. If it meets the threshold, then it may be included when generating the files we publish. (Other constraints may apply as well, depending on the subproject, and contributors can request that hashes of their dumps not be published.)
We are pretty conservative about this, due to the difficulties inherent in a writable medium. In the future, we may decide to lower the threshold slightly so that more hashes will be published, which increases the risk of non-pristine images slipping through.
As an aside, a similar concept is also used in the analysis of "fixed" images (as seen in the Jouyou set) although we don't track it numerically. It is largely based on the level of testing that was done.