[ic] Rolling big tables (mysql)
emailgrant at gmail.com
Wed Apr 11 11:49:53 EDT 2007
> > I do keep a separate table of robot UAs and match traffic rows to them
> > with op=eq to populate another table with robot IPs and non-robot IPs
> > for the day to speed up the report. Don't you think it would be
> > slower to match/no-match each IC request to a known robot UA and write
> > to the traffic table based on that, instead of unconditionally writing
> > all requests to the traffic table? If not, excluding the robot
> > requests from the traffic table would mean a lot less processing for
> > the report and a lot fewer records for the traffic table.
> Perhaps you should create a column called "spider" in the traffic table
> and save a true or false value depending upon the [data session spider]
> value. You can then generate reports "WHERE spider = 0", for ordinary
> users, or "WHERE spider = 1" for robots etc. An index on the spider column
> would be nice, of course.
Have you gotten comfortable using a partial match to determine a robot
UA? I used to use RobotUA but I ended up wanting to make exact
Indexing sounds like something I need to make use of. Is that an
mysql convention handled completely outside of IC?
> Then again, I wouldn't save traffic data to a table anyway. I'd use
> usertrack and/or the apache access_log for that. There are lots of tools
> that will allow you to analyse Apache log files. You can even save some
> Interchange usertrack info into a custom Apache access_log file.
All of my domains run from the same catalog and I don't use
directories or query strings in the URL so I need something that will
allow me to track the domain involved with each request. Can I save
the domain into a custom access_log? If so, do you know of an
analyzer that would allow me to report on that domain info?
Alternatively, the usertrack file might work if I can save and report
on the domain there. How can I parse a flat log file like that?
More information about the interchange-users