ethan at endpoint.com
Fri Apr 10 13:01:03 UTC 2009
Mike Heins wrote:
> Quoting Jon Jensen (jon at endpoint.com):
>> On Sat, 4 Apr 2009, Mike Heins wrote:
>>> There is a significant territory between the large massively- accessed
>>> site and the site with so little traffic it doesn't matter what you do.
>>> Interchange has only a few sites with backcountry.com traffic and we are
>>> never going to generalize solutions for them.
>>> But there are many sites with between 50,000 and 500,000 visitors per
>>> month. During peak access times it does matter what they are doing, as
>>> much for perceived performance as system load.
>> Yes, that's true, but even the lightly-trafficked sites can have huge
>> bursts of traffic from Digg, Twitter, or whatever, and being able to
>> withstand it gracefully makes the difference between a lot of new users
>> and a temporarily down website.
> Yup. And there are the robot crawls, too, which can get ugly if
> you don't pay attention to them.
>> Premature optimization is expensive and often misguided, but not
>> optimizing at all leads to unhappy surprises.
> As we have discovered. The PreFork method turned out to be one of the
> biggest wins, as did mv_tmp_session. Plus lots of memory is an optimization
> in a class by itself. 8-)
Yeah; mv_tmp_session must never be overlooked. Sessions are a
significant scalability problem, particularly as currently implemented.
The locking of a shared resource is a potential scalability threat to
begin with. As you and Jon have both noted: the majority of use cases
for Interchange do not routinely face Backcountry.com-levels of traffic,
but (again, as you both noted) that doesn't mean scalability doesn't
matter; the internet's social patterns can drive huge volumes of traffic
to a little-known site without warning.
>> I'm not telling you anything you don't know here. Maybe what I'm trying to
>> say is that at least some of the lessons of the very highly-trafficked
>> sites can be applied to the benefit of the less-trafficked sites as well.
>>> That's the whole point. The servlet isn't Interchange-based, but uses
>>> the data written by Interchange on session initiation.
>> What do you mean by "servlet"? Something like a minimal mod_perl handler
>> that hits the database? Or a standalone CGI? Anything like that is fine,
>> but would complicate the architecture and is exactly what I'd expect for a
>> highly-trafficked site.
> Yes, that's what I mean. It would complicate the architecture somewhat,
> to be sure. It may not be the right thing to do. It would be nice to
> be able to do it relatively painlessly in a standard fashion.
I'm trying to understand why this "servlet" idea is particularly
relevant to the Ajax discussion that started the whole thread. I think
what it's coming down to is:
* Ajax-heavy applications are typically characterized by relatively
infrequent "big" requests alongside very frequent "little" requests
* Those "little" requests may be in response to user-generated events or
timer-based events, and consequently have concurrency patterns that are
potentially more complex than we're traditionally used to seeing
* The heavy-weight, exclusively-locked sessions don't lend themselves
well to concurrent (or near-concurrent), frequent little requests
It seems that you (Mike) have concerns about IC being a heavy-weight
process beyond just the session itself.
I've done pretty intensive Ajax work for a few different apps on
Interchange, and I really don't think that Interchange itself is a
"problem". The application design is of greater concern. Is your app
designed with a lot of user-specific content throughout? Then designing
it to perform well and scale effectively is harder. Does your app play
messy with GET and POST, such that GET requests readily change
server-side state (in the session, in the database, etc.) just as
frequently as POST requests? Then again you will have a scaling problem.
Since Backcountry has come up a few times in this discussion, let's look
there. They use quite a lot of Ajax. The checkout process is
thoroughly Ajaxified. The community features (product reviews, product
questions and answers, etc.) are all Ajax-heavy. And so on. But they
aren't introducing new components into their architecture to address
this: the primary tools to handle these things server-side remain Apache
and Interchange. The technique applied in using those tools is what
makes a difference.
If they are a point of reference for the extreme levels of
traffic/scalability/performance needs for Interchange, then their
experience suggests that introducing a new parallel Interchange daemon
for light-weight requests is not necessary; it's instead possible to
make Interchange itself more responsive, by optimizing your resource
design to limit session use as much as possible and to maximize the
reuse of cacheable stuff.
>> For the less-busy sites, Interchange in PreFork mode is a great general
>> solution that works well for Ajax requests too. Reducing the number of
>> database requests, reducing the size of the session (or eliminating
>> session usage entirely for some Ajax requests by passing authentication in
>> the URL or POST body), etc. can improve performance without requiring a
>> whole new architecture.
>>> Those can benefit from some level of efficiency that allows them to
>>> Ajax-enable their sites.
>> So that's the crux: I don't know why you're assuming Interchange in
>> PreFork mode with appropriate optimizations is unsuitable. It's definitely
>> working for several very different sites I'm familiar with. But I guess
>> you must have already determined it just won't work for one of your
>> applications? Even if you add an additional app server or two?
> No, I haven't determined that at all. In fact, I have a couple of
> 200,000 visitor sites that easily run on one not-so-beefy server. But
> the optimizations aren't there in a standardized way. If you use
> mv_tmp_session you have no security, and if you use sessions you have no
> optimization. We use mv_tmp_session along with Autoload and custom
> cookie settings for our optimizations, but it is a hack all the way.
Here's a suggestion: refactor the session-handling in Interchange such that:
* a base session model class defines the interface for anything that can
act as a "session"
* provide separate session-type modules that implement that interface
(possibly via subclassing, but it doesn't really matter for purposes of
this discussion) for different storage schemes. Storage schemes to
- standard file-based sessions
- database (DBI) sessions
- cookie-based sessions (meaning the session state is serialized into a
* remove session initialization/fetching/locking from the main dispatch
* instead, always access sessions through either an object method, a
class method, or a simple function, and the act of accessing the session
is what first initializes/fetches the session
* do not attempt to lock a session until the application attempts to
change a value in that session (though that behavior could be
configurable, of course; some session types might be better off with
delayed locking, while others might be better off with exclusive,
immediate locking, depending on the application and purpose)
* introduce session names, such that a given name corresponds to a
session instance of a particular type. This would allow segregation of
application/session state to different session types/instances.
For an Ajax-heavy application, you might want:
* the "default" session to be cookie-based and to hold little other than
user authentication data, on average, though there's no reason it
couldn't hold the shopping cart data as well, allowing for client-side
* a "checkout" session that is file or DB backed, that uses standard
locking and such
Only the checkout-specific pages/actions would use the checkout session.
So only the checkout process incurs the heavier cost of DBI sessions,
locking, and so on. All other processes are lightweight, with no
locking, no fetching of external resources for session state, etc.
End Point Corporation
ethan at endpoint.com
More information about the interchange-users