One of the stated goals of object-oriented programming is to closely associate the knowledge about a particular piece of data with that data itself. This leads to better localization of knowledge, which improves maintenance, organization, and reusability. The abstraction of a class, with attributes and methods, goes a long way towards this goal. But it doesn't address the issue of where the data lives when the program stops running.
If every system consisted of a single process with a single memory image, there would be no persistence issue, but that's not very realistic. Most systems nowadays consist of multiple dæmons running on multiple machines, and if these systems are to be written in the object-oriented paradigm, some mechanism must be used to allow the same object to be instantiated in multiple processes, and still maintain consistency within and amongst objects.
Existing mechanisms for persistence in Perl fall roughly into two categories;
stream-based and database-based. Stream-based persistence involves transforming
objects in memory into a stream (essentially a string), which can then be stored
in, say, a file on disk. Both Data::Dumper and Storable
1 fall into this category.
A database-based persistence scheme transforms objects in memory into a form
that can be directly inserted into a database management system. In the case of
the ObjectStore interface module,2
that database management system is the object
database ObjectStore.3
Other schemes represent objects in tables in a relational
database, such as Tangram,4 the
persistence scheme I presented last year, and POP, the subject of this paper.
    
One of the problems with stream-based persistence is that one generally must store and restore all the objects in the system in one shot. If you wish to change just one attribute of just one object, you have to load up the whole shebang, make the change, and then write back the changes. This gets even trickier if you wish to have two separate processes access the objects at the same time.
Another issue is dealing with transactions; often, one would like a change to one object to happen if and only if some other object is changed. The canonical example is a transfer of money from one bank account to another. If the program has deducted money from the first account, but before it can credit the second account, a comet destroys the computer that was running the program, you wouldn't want the initial deduction still to be applied when the system is eventually reconstructed. Similarly, a second program running simultaneously shouldn't see the initial deduction until the deposit is made, or it might think that the person has less money than they really do.
This type of transactional control is very difficult with a stream- based model, but is exactly the sort of problem which has already been solved in relational databases. Object databases offer similar services, but they can be prohibitively expensive.
So POP stores objects in tables in a relational database. The tables closely mirror the logical structure of the object (see figure 1), which enables the persistence engine to be quite smart about reflecting changes in objects in the database. There are three basic components to the system. The first is a markup language based on XML (called POX, for Persistent Object eXoskeleton) which describes the structure of persistent classes, one class per file. These files (collectively called poxen) are fed through translators that can generate a database schema, skeletal Perl classes, documentation, or even a rudimentary user interface. The poxen are also used at runtime by the persistence base class, which actually stores and restores objects from the database.
Persistent classes built using POP look almost exactly like regular classes, and the detailed structural information in the poxen allow the persistence engine to be smart about what gets sent and received from the database. The indexing features of the database are used to make querying efficient. The auto-generation of the database schema is both necessary because of its complexity and a boon because it speeds development.
Each persistent class requires a POX, specifically to define the structure and 
type of the attributes of that class. The top-level tag, <class>, contains,
as attributes of the tag, the Perl-level name for the class, an optional 
abbreviation of the class name for use in the database,
5 an optional list of 
super-classes, an optional indication that this is an abstract class, and an 
optional indication that this is a link class (a special type of class used to 
connect other objects together in a relationship). For example,
    
<class	name='Person'
        abbr='pers'
        abstract='1' >
...
</class>
or
<class	name='Employee'
        abbr='emp'
        isa='Person' >
...
</class>
    
    
Multiple inheritance can be specified by supplying a comma-separated list of 
classes for the isa attribute.
    
Within the class tag, each persistent attribute is specified with either an 
<attribute> or a <participant> tag. A participant is a special type of 
attribute used in link classes, and is described below. The <attribute> tag takes 
a Perl-level name, an optional abbreviation for use in the database, and data 
type and arity (either scalar, array or hash) values. For example,
    
<attribute 	name='name'
                type='char(30)' />
    
    
    or
<attribute 	name='address'
                abbr='addr'
                type='varchar(255)'
                list='1' />
    
    
    or
<attribute 	name='meetings'
                abbr='mtgs'
	        hash='1'
                key_type='numeric(11,0)'
                val_type='Schedule::Meeting' >
Hash of meetings; key is meeting time in
epoch seconds; value is a Schedule::Meeting object
</attribute>
    
    
    This is the only structure that is required for persistence. There's also a (fetal) facility for specifying object methods, class methods and constructors, which would be rendered as stubs in the skeletal Perl class generated by POP. There's a synchronization problem which comes up in doing this though, because the Perl module is likely to diverge from the POX, so until POP turns into a full-fledged CASE tool, I'd recommend against trying to use this feature.
As you can tell from the above examples, POP supports object attributes with scalar, list/array or hash arities. A scalar may be either a simple data type like a number or a string or an embedded object. Arrays are similarly typed, but note that there is no support for heterogeneous arrays; every element of the array must be of the same type. This will hopefully change at some point. There are two data types for hashes; the type of the key and the type of the value. The type of the value may be any valid scalar type, but the key may not be an embedded object.
Embedded objects are handled specially; only the unique id of the object
(called the pid, or persistence id6) is stored in 
the database, and the actually restoring of the object is delayed until the 
corresponding attribute is actually used. In the case of scalar attributes 
containing embedded objects, when the parent object is restored, a special tied 
scalar is created which contains the necessary information to restore the 
embedded object (specifically, the class and the pid), and is entered into the 
parent object. The tied scalar also contains a reference to itself (not to its 
internal implementation, mind), so that when it is read (i.e., when FETCH is 
called), it restores the embedded object, and unties itself, and sets itself to 
this object. Arrays and hashes of embedded objects are handled in a somewhat 
more straightforward fashion by tied array and tied hash classes.
    
The poxen are converted to a Perl data structure by a POX_parser object, which 
embeds an XML::Parser. A ->parse method takes the path to a POX, parses it, and 
returns a nested-hash data structure. For example, there's a top-level key of 
the returned hash for the scalar attributes of the class; the value is a (reference 
to a) hash whose keys are the names of each scalar attribute and whose values 
are themselves (references to) hashes, representing the attributes (like the name, 
data type, etc.) of that attribute.
    
These parsed classes are used both by the persistent base class at run-time and 
by the translators; poxdb, poxperl, poxhtml and poxui.
    
poxdb reads all the poxen for the current set of classes and generates a file 
containing SQL statements which will create the tables, indexes and stored 
procedures in the database to store and retrieve objects in those classes. This 
file is referred to as a schema file. Each create table, create index or create procedure statement is tagged with the class which requires it, which 
allows the database manipulation programs, schema_load, schema_drop and schema_trunc to operate on a per-class basis if desired. schema_load reads the 
schema file and executes the statements in it in a database (which database to 
connect to, and the connection parameters of that database, are configured by 
environment variables), optionally for one or more specific classes, but by 
default for all the classes. schema_drop, as the name suggests, undoes the 
actions of schema_load, dropping tables, indices and stored procedures. 
schema_trunc truncates the tables referenced in the schema file, which is very 
useful in development and certification.
    
poxperl creates a skeletal Perl module implementing the persistent class, 
including get/set accessors for all of the attributes. The persistence of the 
class is almost completely transparent; the only difference
7 from a typical Perl 
class is the following:
    
	use POP::Persistent;
	use vars qw/@ISA/;
	@ISA = qw/POP::Persistent/;
    
    
    
poxhtml uses the comments in the POX (you did put comments in your POX, didn't 
you?) to create HTML documentation for the class. This is of limited utility if 
only the attributes are defined in the POX, but it looks pretty slick. It even 
color-codes inherited methods and attributes by base class. poxui, which is 
still in a development stage, creates a CGI interface to allow maintenance of 
persistent objects. It will never be sufficient to be a real GUI, but it can be 
quite handy for debugging and testing. In lieu of such an interface, however, 
the Perl debugger is quite useful for creating test objects.
    
POP::Persistent is the base class for persistent objects. It contains the 
default constructor for persistent objects, which creates a self-tied hash (an 
object which is a reference to a hash which is tied into the same package as 
the object). If the constructor is called with no arguments,
it supplies a new pid (persistence id), and attempts to call an ->initialize method before the 
underlying hash is tied, which allows default values for attributes to be set.
    
An existing object can be retrieved by supplying the pid to the constructor, 
but in general, pids will almost never be seen in application code. Usually, an 
object is restored by being embedded in another object, by participating in a 
link with another object, or through some alternate constructor, from a user-
based key (e.g., Person->new_from_empid).
    
Once a persistent object has been constructed in memory, any and all accesses 
to attributes are intercepted by the STORE and FETCH methods in the persistent 
base class. FETCH ensures that the in-memory representation is up-to-date with 
the database, reading updated attribute information if necessary. STORE ensures 
that any changes to the object are reflected in the database. There are a 
number of optimizations that make this scheme feasible.
    
All pre-computable queries and updates are created as stored procedures, which allows the database to pre-compute the query plan. Normally, I shy away from using stored procedures because they can become a maintenance nightmare, but in this case, they are automatically generated, which removes most of the maintenance concerns.
In the database, there's a global table with one row for every object 
containing that object's pid and a version number. Whenever an object is 
changed in the database, this version number is updated. This allows the ->FETCH 
method to initially check just this version number to know if it needs to 
refresh the data in the object. If the process that attempted to read the 
attribute is in a transaction and has already modified the object in question, 
it doesn't even have to check the version number; no other process could have 
modified the object. Similarly, STORE only has to update the version number 
once per object during a transaction, even if it makes many modifications. 
Actually, STORE doesn't even have to replicate changes until the end of a 
transaction, but the current implementation does not do that, since that would 
cause commit to take longer. Choosing between these two behaviors should be an 
option, however.
    
The lazy object loading discussed earlier is another performance enhancement. 
A per-process memory cache of persistent objects allows the same object to be 
used multiple times in the same process with only one copy of the object in 
memory. This cache is maintained using weak references (currently the Devel::WeakRef 
module is used, but the built-in weak refs coming in 5.006 should work just 
fine for that purpose).
    
Since an object with multi-valued attributes (arrays and hashes) could have a 
potentially huge amount of data, some effort is made to optimize access to 
multi-valued attributes. A version number is kept for each multi-valued 
attribute, which allows STORE to avoid writing unchanged data back to the 
database, and also to allow FETCH to avoid reading lots of unchanged data from 
the database.
    
Object deletion is supported, and pains are taken to ensure that no object can 
be deleted while it is referenced by another object. The ->delete method in the 
persistent base class throws an exception if the object is referenced by any 
other object (i.e., it occurs as an embedded object in some other object), and 
actually implements the deletion otherwise. Class implementers should define 
their own ->delete method, which can implement specific semantics with regard 
to deletion, such as removing the object from referencing objects, or 
disallowing deletion of special instances (like an administrative user). This 
is left up to the class, since desired semantics can vary greatly from class to 
class.
    
A (poorly named) class method ->all (defined in the base class) is used to 
enumerate all the objects of a class. By default, it returns a list of all the 
pids belonging to that class.
8
You can supply a list of one or more attribute 
names to have those attributes returned for each object instead of (or as well 
as) the pid. If more than one attribute is specified, ->all will return a list 
of array references, with each array containing the values of those attributes for 
each object. This is immensely useful for, say, populating drop-downs in user 
interfaces. For example, consider an HTML popup menu where you want the value 
of each option to be the pid of the object, so the agent receiving the post of 
the form can easily restore the selected object, but the label to be some other 
attribute of the object, like its name.
    
my $q = new CGI;
...
print $q->popup_menu(
                -name => 'who',
                -values => Person->all,
                -labels =>
        { map {@$_} Person->all('pid', 'name') }
);
    
    
    
By supplying options in an optional hash reference as the first argument to ->all, 
the behavior can be further refined. For instance, the returned values can be 
sorted by any particular scalar attribute of the object, even if that attribute 
is not among the selection set, by using a 'sort' key. There's also a 'where' 
key whose value is something like a SQL where clause, but already parsed. The 
syntax for this is extremely rudimentary, and definitely in need of improvement;
the Tangram module has a very nice way of dealing with this, which I hope to 
adapt.
    
Often, embedding one object inside another to represent a relationship between 
the two objects can get very messy. Embedding generally implies a parent-child 
relationship and not all object-to-object relationships fit that model. Often, 
the semantics of the relationship imply that the "child" object should have a 
link back to the parent. This can make certain operations, such as deletion, 
cumbersome. Furthermore, many relationships logically involve more than two 
objects. For example, consider a marriage. You might think that marriage 
involves just two Person objects, one playing the role of husband and the other 
playing the role of wife.
10
However, one could reasonably extend the relationship 
to include such entities as the official who legalized the marriage, the time 
of the marriage, its location, etc.
    
For these reasons, we'll represent a marriage as an object. Here's the POX for 
our Marriage class:
    
<class	name='Marriage'
	abbr='marr'
	type='link'>
<participant	name='husband'
		abbr='husb'
		type='System::Person' />
<participant	name='wife'
		type='System::Person' />
<participant	name='official'
		type='System::Person' />
<attribute	name='datetime'
		abbr='dt'
		type='numeric(11,0)' />
<attribute	name='location'
		abbr='loc'
		type='varchar(255)' />
</class>
    
    
    Participants are really glorified attributes, whose type must be a persistent class. When the schema file is generated from this POX, indices are added for each participant column. Poxperl will create a class method of Marriage for each participant that takes an object of the type of that participant and returns all the Marriage objects containing that object in that role. For example:
package Person;
...
sub husband {
  my $this = shift;
  my @marriages = 
    System::Marriage->all_with_wife($this);
  if (@marriages > 1) {
    croak $this->name, " is polyandrous";
  }
  return $marriages[0]->husband;
}
...
    
    
    Dealing with relationships in link classes sometimes takes a bit more work than with embedded objects, but the advantage is that they're more flexible, and different semantics can be implemented for different types of relationships.
One of the advantages of representing objects in a relational database is that the built-in transactional support can be leveraged. POP supports two transactional models, an ANSI mode and an auto-commit mode. Under the ANSI mode, every process is implicitly in a transaction, and every transaction must be explicitly ended with a commit or rollback. Under auto-commit, every modification of an attribute is immediately reflected in the database. A program can switch between the two modes at runtime, so the simpler auto-commit mode can be used for most of a program, and the more efficient ANSI mode can be used where necessary.
Long-running transactions carry their own performance problems, however, when 
more than one process is involved. This happens because the database system 
must use locks to prohibit other processes from changing the data until the end 
of the transaction. By default, other processes are also blocked from reading 
data that has been changed by the transaction, to guarantee the property that 
the same row can be read from the database multiple times in a transaction and 
it will always have the same value. However, most databases support different 
isolation levels, that is, different degrees by which one transaction is 
isolated from the changes occurring in another transaction. Often, for 
performance reasons, you want to read whatever data exists in the database, 
even if it might be in an inconsistent state. This is referred to as "dirty 
read". POP supports whatever isolation levels are supported by the database, 
and allows the level to be changed for the process at runtime. Furthermore, 
->all operates by default at a dirty read isolation level, but it is 
configurable.
    
POP is available on the CPAN, at $CPAN/authors/id/B/BH/BHOLZMAN/pop-0.06.tar.gz. It has only been tested on Solaris, with Sybase as the relational database back-end. It should be fairly trivially portable to any operating system that Perl runs on. With a bit of work, it should be possible to port it to Oracle and Informix, but less full-featured database systems like mySQL pose more of a problem.