[ic] Interchange and clustering: LVS for http(s), and MySQL replication for read-only clustering?

Dan Browning danb@cyclonecomputers.com
Wed, 18 Oct 2000 22:44:29 -0700


Friends, Romans, countrymen, lend me your ears:

How does IC interact with clustering/load balancing?  Specifically, I'm
looking at implementing something like the following:

Web Clustering
--------------
HTTP (and HTTPS/SSL) clustering, Software based, using LVS
(www.linuxvirtualserver.org).  I plan to use mod_ssl on each IC server to
decrypt the SSL traffic (there should be a fair amount).

This is the most likely scenario because it is open source.  LVS supposedly
works mostly on the layer-2 (and sometimes 3,4) levels.  As I understand IC,
it would require somehow centrally storing all the IC-related directories,
such as (for example) '/var/lib/interchange' and '/usr/local/interchange'.
 - What is the best way to have that kind of smart, central storage?
	- Is NFS on Linux stable enough?  v3?
	- How much read/write activity does IC have during normal browse-and-buy
useage?
	- What X to Y ratio of NFS server beef to web server would be required?
		- For example:  a 10-disk RAID-10 ultra160 array on a Quad-Zeon server
acting as NFS  ---->  10 P-3 700Mhz 512MB web (IC) servers?
 - Would another way be better than NFS?
	- Coda, is it reliable enough?
	- GFS?
	- InterMezzo?
 - I imagine that I could just install interchange on the NFS server, and
map the '/usr/local/interchange' and '/var/lib/interchange' to each web
server (client) as read-write.
	- But will I run into locking and contention issues?
	- Does IC hold any files open for a write for long periods of time?


Database Clustering
-------------------
Currently, MySQL has the replication abilities in the beta series (which is
supposedly quite stable--comments?).  But PostgreSQL does not have the
replication code public yet (though they made the announcement a little
while back that the code was coming soon).  The MySQL replication setup
would have a "Master" mysql server that was used for read/writes.  Then it
would have multiple "slave" servers that it would replicate (in
near-real-time) to whenever new data is added to the database.
	- What databases are written to a lot in IC?  (I can think of:  New order,
new customer, ...?)
	- Possible code modification for load balancing would recognize some new
.cfg directives:
WRITE_HOSTNAME: masterdb.domain.com
READ_HOSTNAME:  readonlydb.domain.com
Then, one would setup some kind of clustering for
	readonlydb.domain.com
		|
		+-----> readonlydb1.domain.com
		|
		+-----> readonlydb2.domain.com
		|
		+-----> readonlydb2.domain.com
Of course, I don't know how to cluster DB's, but I think one could do it as
well with LVS.  But if not LVS, then round-robin DNS would work at least.

But the real coding changes would be in all the places that the DB is
accessed.  It would be good to change all the places where read-only stuff
is done to use the READ_HOSTNAME database connection format.  But where
write is required, it could use the WRITE_HOSTNAME.

Personally, I would like to cluster for:
	1- Availability.
	2- Performance.

Anyway.  I hope this can get some conversation going on clustering IC.
Sorry, no interest paid on the lending of ears.

Dan Browning
Network & Database Administrator
Cyclone Computer Systems

P.S. Would this be a good bug submission (ENHANCEMENT)?