[ic] Rolling big tables (mysql)

Grant emailgrant at gmail.com
Tue Apr 10 17:01:34 EDT 2007


[snip]
> > Thank you for the insight.  The table in question actually logs
> > traffic data from my site's visitors.  The bigger the table gets, the
> > longer it takes to run a report on it.  I have a date field in there,
> > and I'd like to backup and delete each row older than 30 days.  I'd
> > prefer those rows to be backed up to another table as opposed to a log
> > file so I can run a report on that old data if I need to.
> >
> > Is there an ictag or two I can use to copy the old rows to a different
> > table and delete them from the primary table?
> >
> > - Grant
>
> Hi Grant,
>
> For something like this I would probably build a perl script to do the heavy lifting so that you are not within the constraints of the IC server and needing to complete in under PIDcheck seconds.
>
> How many rows are in your table when it starts getting slow?  Are your fields properly indexed?

Thanks Ron,

Now you have me wondering if I just need to do some mysql
optimization.  I haven't optimized the dbconf/mysql/traffic.mysql
config file at all.  What do you mean by indexing exactly?  The
traffic table is just a raw record of a few pieces of info from each
IC request.  Is there any other type of optimization I can do?

I run a report on the table daily, and two days after rolling the
table, it seems to take twice as long as it did the day before.  It
accumulates about 3,000 rows a day.

I do keep a separate table of robot UAs and match traffic rows to them
with op=eq to populate another table with robot IPs and non-robot IPs
for the day to speed up the report.  Don't you think it would be
slower to match/no-match each IC request to a known robot UA and write
to the traffic table based on that, instead of unconditionally writing
all requests to the traffic table?  If not, excluding the robot
requests from the traffic table would mean a lot less processing for
the report and a lot fewer records for the traffic table.

- Grant


> First create a table where the archived data will reside, such as "stats_archived".
>
> Your script should:
>
> 1. Use DBD::MySQL to connect to your database and run queries.
>
> 2. Use MySQL's "SELECT INTO" with a where to restrict to only records over 30 days old, and select those records from the live table into the "stats_archived" table.
>
> 3. Then use MySQL's "DELETE FROM" with the same where to restrict the same record set and delete those from the live stats table.
>
> 4. The script then can be setup to run nightly through cron, or perhaps at the beginning of the month so that the previous month's data is available all month long.


More information about the interchange-users mailing list