Game_Music_Emu 0.5.6 Design
---------------------------
This might be slightly out-of-date at times, but will be a big help in
understanding the library implementation.


Architecture
------------
The library is essentially a bunch of independent game music file
emulators unified with a common interface.

Gme_File and Music_Emu provide a common interface to the emulators. The
virtual functions are protected rather than public to allow pre- and
post-processing of arguments and data in one place. This allows the
emulator classes to assume that everything is set up properly when
starting a track and playing samples.

All file input is done with the Data_Reader interface. Many derived
classes are present, for the usual disk-based file and block of memory,
to specialized adaptors for things like reading a subset of data or
combining a block of memory with a Data_Reader to the remaining data.
This makes the library much more flexible with regard to the source of
game music file data. I still added a specialized load_mem() function to
have the emulator keep a pointer to data already read in memory, for
those formats whose files can be absolutely huge (GYM, some VGMs). This
is important if for some reason the caller must load the data ahead of
time, but doesn't want the emulator needlessly making a copy.

Since silence checking and fading are relatively complex, they are kept
separate from basic file loading and track information, which are
handled in the base class Gme_File. My original intent was to use
Gme_File as the common base class for full emulators and track
information-only readers, but implementing the C interface was much
simpler if both derived from Music_Emu. User C++ code can still benefit
from static checking by using Gme_File where only track information will
be accessed.

Each emulator generally has three components: main emulator, CPU
emulator, and sound chip emulator(s). Each component has minimal
coupling, so use in a full emulator or stand alone is fairly easy. This
modularity really helps reduce complexity. Blip_Buffer helps a lot with
simplifying the APU interfaces and implementation.

The "classic" emulators derive from Classic_Emu, which handles
Blip_Buffer filling and multiple channels. It uses Multi_Buffer for
output, allowing you to derive a custom buffer that could output each
voice to a separate sound channel and do different processing on each.
At some point I'm going to implement a better Effects_Buffer that allows
individual control of every channel.

In implementing the C interface, I wanted a way to specify an emulator
type that didn't require linking in all the emulators. For each emulator
type there is a global object with pointers to functions to create the
emulator or a track information reader. The emulator type is thus a
pointer to this, which conveniently allows for a NULL value. The user
referencing this emulator type object is what ultimately links the
emulator in (unless new Foo_Emu is used in C++, of course). This type
also serves as a useful substitute for RTTI on older C++ compilers.

Addendum: I have since added gme_type_list(), which causes all listed
emulators to be linked in. To avoid this, I make the list itself
editable in blargg_config.h. Having a built-in list allows
gme_load_file() to take a path and give back an emulator with the file
loaded, which is extremely useful for new users.


Interface conventions
----------------------
If a function retains a pointer to or replaces the value of an object
passed, it takes a pointer so that it will be clear in the caller's
source code that care is required.

Multi-word names have an underscore '_' separator between individual
words.

Functions are named with lowercase words. Functions which perform an
action with side-effects are named with a verb phrase (i.e. load, move,
run). Functions which return the value of a piece of state are named
using a noun phrase (i.e. loaded, moved, running).

Classes are named with capitalized words. Only the first letter of an
acronym is capitalized. Class names are nouns, sometimes suggestive of
what they do (i.e. File_Scanner).

Structure, enumeration, and typedefs to these and built-in types are
named using lowercase words with a _t suffix.

Macros are named with all-uppercase words.

Internal names which can't be hidden due to technical reasons have an
underscore '_' suffix.


Managing Complexity
-------------------
Complexity has been a factor in most library decisions. Many features
have been passed by due to the complexity they would add. Once
complexity goes past a certain level, it mentally grasping the library
in its entirety, at which point more defects will occur and be hard to
find.

I chose 16-bit signed samples because it seems to be the most common
format. Supporting multiple formats would add too much complexity to be
worth it. Other formats can be obtained via conversion.

I've kept interfaces fairly lean, leaving many possible features
untapped but easy to add if necessary. For example the classic emulators
could have volume and frequency equalization adjusted separately for
each channel, since they each have an associated Blip_Synth.

Source files of 400 lines or less seem to be the best size to limit
complexity. In a few cases there is no reasonable way to split longer
files, or there is benefit from having the source together in one file.


Preventing Bugs
---------------
I've done many things to reduce the opportunity for defects. A general
principle is to write code so that defects will be as visible as
possible. I've used several techniques to achieve this.

I put assertions at key points where defects seem likely or where
corruption due to a defect is likely to be visible. I've also put
assertions where violations of the interface are likely. In emulators
where I am unsure of exact hardware operation in a particular case, I
output a debug-only message noting that this has occurred; many times I
haven't implemented a hardware feature because nothing uses it. I've
made code brittle where there is no clear reason flexibility; code
written to handle every possibility sacrifices quality and reliability
to handle vaguely defined situations.


Flexibility through indirection
-------------------------------
I've tried to allow the most flexibility of modules by using indirection
to allow extension by the user. This keeps each module simpler and more
focused on its unique task.

The classic emulators use Multi_Buffer, which potentially allows a
separate Blip_Buffer for each channel. This keeps emulators free of
typical code to allow output in mono, stereo, panning, etc.

All emulators use a reader object to access file data, allowing it to be
stored in a regular file, compressed archive, memory, or generated
on-the-fly. Again, the library can be kept free of the particulars of
file access and changes required to support new formats.


Emulators in general
--------------------
When I wrote the first NES sound emulator, I stored most of the state in
an emulator-specific format, with significant redundancy. In the
register write function I decoded everything into named variables. I
became tired of the verbosity and wanted to more closely model the
hardware, so I moved to a style of storing the last written value to
each register, along with as little other state as possible, mostly the
internal hardware registers. While this involves slightly more
recalculation, in most cases the emulation code is of comparable size.
It also makes state save/restore (for use in a full emulator) much
simpler. Finally, it makes debugging easier since the hardware registers
used in emulation are obvious.


CPU Cores
---------
I've spent lots of time coming up with techniques to optimize the CPU
cores. Some of the most important: execute multiple instructions during
an emulation call, keep state in local variables to allow register
assignment, optimize state representation for most common instructions,
defer status flag calculation until actually needed, read program code
directly without a call to the memory read function, always pre-fetch
the operand byte before decoding instruction, and emulate instructions
using common blocks of code.

I've successfully used Nes_Cpu in a fairly complete NES emulator, and
I'd like to make all the CPU emulators suitable for use in emulators. It
seems a waste for them to be used only for the small amount of emulation
necessary for game music files.

I debugged the CPU cores by writing a test shell that ran them in
parallel with other CPU cores and compared all memory accesses and
processor states at each step. This provided good value at little cost.

The CPU mapping page size is adjustable to allow the best tradeoff
between memory/cache usage and handler granularity. The interface allows
code to be somewhat independent of the page size.

I optimize program memory accesses to direct reads rather than calls to
the memory read function. My assumption is that it would be difficult to
get useful code out of hardware I/O addresses, so no software will
intentionally execute out of I/O space. Since the page size can be
changed easily, most program memory mapping schemes can be accommodated.
This greatly reduces memory access function calls.