database

In general, databases contain information, usually in tabular format, where columns define the names and types of contained data, and rows represent entries — database records.

Interchange is primarily using databases to just retrieve values from specific tables, and does not use any higher-level functions of RDBM databases (such as views, triggers, or stored procedures in PostgreSQL). Such things can, however, be implemented in the database independently of Interchange, as Interchange will properly pass any warning or error messages back and forth.

We should say right away that Interchange is completely database-independent. The choice of actual database types that can work with Interchange is large, and Interchange can use some database-like methods automatically when you're not explicitly interested in paying attention to databases working behind the scenes.

Common features are transparently available everywhere (with absolutely no code hacks or special cases required), regardless of the underlying database type used. In addition, almost no field names are hard-coded, allowing for unlimited flexibility.

Keep in mind that the terms database and database table actually mean the same thing in Interchange parlance - a database table.

Interchange works with GNU DBM, DB_File, SQL, LDAP and in-memory types of databases. Regardless of type or other characteristics, each database must be registered on a catalog level before it's ready to be used, and this is achieved using the Database configuration directive. It's useful to remember at this point that multiple catalogs can share the same database.

Three parameters need to be present in a basic Database definition: an arbitrary database name, text source file with initial content, and the type of the database.

Text Source Files

Text source files are not databases themselves, of course (for performance and other reasons); they are only used to provide initial data for the corresponding database tables.

By default, all database source files are kept in the products/ subdirectory of your CATROOT. The ProductDir directive controls the exact location.

The ASCII files can contain carriage return (^M) characters even in data fields, but must have a newline character (^N) at the end of line to properly separate records. Mac users uploading files must use ascii mode, not binary mode.

Interchange's default ASCII delimiter is TAB. Keep in mind that the items must be separated by a single delimiter (that is, by a single TAB only). Due to the nature of TABs, TAB-delimited files look messy and unaligned when viewed in a text editor. Do not try to fix these; better use the te utility that comes as part of the Interchange distribution to edit such files more conveniently.

Interchange can manage an unlimited number of arbitrary database tables and database table types. Several flexible delimiter schemes are available "out of the box":

TAB-delimited file (Type 1, the default): Fields are separated by TAB characters. No whitespace is allowable at the beginning of the line.
```
code	description	price image
SH543	Men's fine cotton shirt	14.95	shirts.jpg
```
(You might notice that the field names and values above are not properly aligned. This is the nature of tab delimited files.)
Using the default TAB delimiter is recommended if you plan on searching the ASCII source file of the database.
LINE (Type 2): Fields are specified each on its own line, separated by the newline (\n) character. One blank line separates a record from another record.
```
code
description
price
image

SH543
Men's fine cotton shirt
14.95
shirts.jpg
```
%% (Type 3): Fields are separated by the literal combination of "\n%%\n", while the records are separated by "\n%%%\n". Users fond of the Unix "fortune" program may find this format familiar.
```
code
%%
description
%%
price
%%
image
%%%
SH543
%%
Men's fine cotton shirt
%%
14.95
%%
shirts.jpg
```
CSV-delimited file (Type 4): Fields are enclosed in quotes and separated by commas. Again, no whitespace should be at the beginning of the line.
```
"code","description","price","image"
"SH543","Men's fine cotton shirt","14.95","shirts.jpg"
```
CSV-delimiter schemes might cause problems with ASCII text searching routines.
PIPE-delimited file (Type 5): Fields are separated by the pipe ("|") characters which resemble vertical lines. No whitespace is allowable at the beginning of the line.
```
code|description|price|image
SH543|Men's fine cotton shirt|14.95|shirts.jpg
```
PIPE-delimited files perform fairly well with ASCII text searching routines.
TAB-delimited file (Type 6):
<reserved> (Type 7):
SQL (Type 8):
LDAP (Type 9):

	Note
	Field names are usually case-sensitive (in fact, that depends on the underlying database type). Always be consistent when naming or referencing fields and you'll avoid the trouble. All lower or all upper case names are recommended.

If a database is specified to be one of the first six types, then the database will automatically be converted to a more efficient internal structure. Those include DB_FILE, GDBM, or MEMORY. The order of preference and the selection is:

SQL Databases

As hinted above, you do not need to use an external SQL database. If you only have a small data set, you could use Interchange's internal databases. This is a tremendous gain for small and quick setups, or ad-hoc Interchange evaluation. However, some functions (order management, for example) will be slower and not as robust without an SQL database. SQL is strongly recommended for at least the state, country, orderline, transactions and userdb tables. Any other tables that will have programmed updates, such as inventory, are also best placed in SQL.

	Database performance
	Do not, however, try to optimize too soon and for no measurable difference. Do not fall in the jaws of premature optimization, your worst enemy.

Generally, you should make an additional effort of configuring and using SQL databases to achieve Interchange's full potential. Using SQL also makes your data sets easily available for integration with other applications.

In any case, database import and conversion routines are already available in Interchange and you can use them at any point.

Speaking of the source files' behavior, if a file named table.sql is present in the same directory as table.txt, then database table will never be imported from the ASCII text source file. If there is no table.sql, the DBI/SQL import will happen once, at Interchange startup or catalog reconfiguration time (and the table.sql file will be created); Interchange will connect to the SQL database using the specified DSN (DSN is a standard DBI parameter meaning "Database Source Name"). The table will be dropped (if it already exists in the database) using a line similar to DROP TABLE table. This will occur without warning, but NoImport can be used to prevent it or otherwise change the default behavior. The table will then be created again and populated with text source file data.

If there are any COLUMN_DEF specifications present in interchange.cfg, catalog.cfg or products/table.sql, they will be used to create SQL table specification (which is recommended for clean and correct database layout). If there aren't any, however, then the key (first field in the text file, by default) will be created with the type char(16), and all other fields will be created as char(128). This is very unfortunate, but the best Interchange can do without your help. Table creation statements will be written to the error.log along with, of course, any errors. From our experience, the most common mistake at this point is choosing column names that sound perfectly reasonable, but also happen to be reserved keywords in MySQL. (The error messages appear to be misleading here, so you better take a look at the list of reserved MySQL keywords before losing patience with the problem).

Once the database (database table actually, remember?) is created, the text source file will be imported into it. For this step to succeed, data typing must be user-configured. In other words, if say, word "none" is placed in a field while the field in question is defined to be of numeric type, database import will not succeed; consequently, the problematic catalog won't configure successfully (it will be skipped) and it won't be available when Interchange starts up.

For a complete discussion, please see the Database configuration directive.

File-based Databases

By file-based databases we primarily assume GNU DBM and DB_File. We also call those database types "internal", since in the absence of say, an SQL definition, all inferior formats (such as text source files) are automatically converted to some kind of a file-based database.

Those database types usually work in a way that, on every client access, the appropriate database text source file is checked for being newer than the actual DB file itself. When it happens that it is, the database table is re-imported from the text source file on the fly, and the routine then proceeds as usual.

Database updates

	Database updates
It is important to note that, when using Interchange internal database methods, all changes in the text source files cause the databases to be re-created. This can have unwanted effects if the database was modified from within Interchange and the contents have not been written back to the text source files. Another common problem are larger data sets that take noticeable time to get imported to (or exported from) the internal database. The exact behavior can be controlled via the `NoImport` config directive, but by default, changes in text files will trigger a complete rewrite of DBM or DB_File databases.

It is important to note that, when using Interchange internal database methods, all changes in the text source files cause the databases to be re-created. This can have unwanted effects if the database was modified from within Interchange and the contents have not been written back to the text source files. Another common problem are larger data sets that take noticeable time to get imported to (or exported from) the internal database.

The exact behavior can be controlled via the NoImport config directive, but by default, changes in text files will trigger a complete rewrite of DBM or DB_File databases.

To check if you have GNU DBM and GDBM Perl support available, run perl -le'require GDBM_File and print "I have GDBM."'. To check if you have Berkeley DB and DBM Perl support available, run perl -le'require DB_File and print "I have Berkeley DB."'. Sometimes you want to use Berkeley DB even if GNU DBM is installed and would naturally take precedence; in such cases, set the MINIVEND_DBFILE environment variable to a true value (setenv MINIVEND_DBFILE 1 in csh, MINIVEND_DBFILE=1 ; export MINIVEND_DBFILE in sh, b(a)sh or ksh). It is also possible to use Berkeley DB for just specific databases.

For a complete discussion, please see the Database configuration directive.

Memory Databases

Memory Interchange databases use Perl hashes to store the data directly in memory. Every time the Interchange server is restarted, it will re-import all in-memory databases for every catalog.

Memory databases are used by default only if no database type is explicitly specified, and there is no DB_File or GNU DBM found on the system. Otherwise they can be used for small but high-traffic tables. Keep in mind, however, that since their contents are not saved back to the text files, you'll want to either take care of the data export yourself, or keep the tables stuffed with read-only data.

if you want to force memory databases despite of GDBM_File or DB_File being present, set the MINIVEND_NODBM environment variable to a true value (look previous chapter for hints on setting it). It is also possible to use memory type for just specific databases.

Memory databases import will be performed once at every Interchange startup or catalog reconfiguration time.

For a complete discussion, please see the Database configuration directive.

Interchange Database Design

We are trying not to impose any database structure that would require our own tools to maintain the data. We always want to keep it such that Interchange data can be maintained via a spreadsheet processor or foreign database tools.

Interchange Database Conventions

This section describes naming and file usage conventions used with Interchange. This is very important for both understanding Interchange and developing your own custom solutions which build upon officially recommended practices.

Term definitions:

key or code
The words reference the database key. In Interchange, the key is usually the product code or SKU, which is the product part number. Otherwise, key values may be used to generate relationships to other database tables.
It is recommended that the key be the first column of the database text source file, since Interchange's import, export, and search facilities rely on this practice.
field or column
The vertical row of a database. One of the columns is always the key and it is usually the first one, as explained above.
table or database
A table in the database. Because Interchange has evolved from a single-table database to an access method for an unlimited number of tables (and databases, for that matter), a table will occasionally be referred to as a database. The only time the term database refers to something different is when describing the concept as it relates to SQL, where a database contains a series of tables. While Interchange cannot create SQL databases, it can drop and create tables within databases if given the proper permissions.

Interchange uses one mandatory database, which is referred to as the products database. In the supplied demo catalog (and in the most of real-world solutions as well), the primary database is directly called products and the ASCII source is kept in the products/products.txt file. This is also the default file for searching contents with the search engine, such as Glimpse, HTDig or Swish.

Note

Interchange also has two optional databases that are specified in special, fixed formats:

shipping.asc database contains shipping options that are accessed if the CustomShipping directive is in use. This is a fixed-format database, and must be created as specified. For more information, see shipping glossary entry and the [shipping] tag.
salestax.asc database contains sales tax information if the [salestax] tag is to be used. A default is supplied. Caution, these things change and need periodic updating! See tax glossary entry for more information.

The two above tables cannot be stored in any user-specified format.


DocumentRoot		debug