it could be called work

the cost of optimization

So between using MTOptimizeHTML and mod_gzip, my server is taking a beating.


last pid: 17424; load averages: 3.96, 2.94, 1.84 up 9+08:17:32 21:19:44
93 processes: 5 running, 88 sleeping
CPU states: 99.2% user, 0.0% nice, 0.0% system, 0.8% interrupt, 0.0% idle
Mem: 168M Active, 17M Inact, 41M Wired, 10M Cache, 35M Buf, 12M Free
Swap: 1027M Total, 440M Used, 587M Free, 42% Inuse

11208 www 62 0 301M 44564K RUN 13:08 24.07% 24.07% httpd
11134 www 62 0 85152K 12724K RUN 5:57 24.07% 24.07% httpd
16618 www 63 0 83352K 49232K RUN 1:13 23.78% 23.78% httpd
17038 www 62 0 82584K 44184K RUN 1:55 23.63% 23.63% httpd
335 mysql 2 0 29320K 376K poll 28:02 0.00% 0.00% mysqld

The top 5 processes are chewing all my CPU time and for long periods of time (see the graphic: the red line is at 100% of CPU and ideally, utilization doesn’t exceed that).

The load average, as displayed by uptime(1), is over 5 now: that’s 4 too many. As you can see, it’s all httpd processes (Apache with an embedded mod_perl interpreter). The mysql process is dormant, for all intents and purposes.

I have to make the call here on how much I want to pay for optimization and what kind makes the most sense. mod_gzip seems like a very elegant solution and the load is distributed over the full day, while the MTOptimizeHTML hit takes place at every rebuild, ie, every post or comment.

it could be called work

on monetizing weblogs

ongoing — You Can Get Paid For This

Tim Bray analyzes his recent foray into Google AdSense. His experiences mirror my own (some ads are worth more than others was one thing I learned early on).

He’s doing a little (!!) better on this than I am, but then I don’t get linked from the tech section of either.

Also, he’s dead right about ads on pages with a single piece do better than on a page with multiple, different pieces. I shoulda remembered that from my startup days and the follow-on experiences with WayPath: you can’t determine contextual relevance against an assortment of different items.

But as he says, there’s some organic growth at work here, so while I may never do quite as well as he’s doing (I’m covering my cable modem bill this month – yay!), there’s some upside.

it could be called work

another robot

Welcome to

Found this in my logfiles just now . . .

PubSub Concepts provides real-time, content based publish and subscribe systems at internet scale. This site is a Beta version of our home page, which will provide a PubSub interface for weblogs and other information sources.

[ . . . ] reads over 100,000 weblogs in real time, and generates new feeds containing information specific to particular issues.

This chart shows what people are talking about – in all the weblogs and RSS feeds we monitor, how many people are talking about each candidate.

This page has more information: if it was me, I’d put it first, since it has more than one datapoint and a lot of RSS feeds for the infojunkies amongst us.

it could be called work

orkut as Google’s data-mining/personalization Trojan Horse?

Jeremy Zawodny’s blog: Why Google needs Orkut:

“Then, one day down the road, they quietly decide to “better integrate” Orkut with Google and start redirecting all Orkut requests to


Suddenly they’re able to set a * cookie that contains a bit of identifying data (such as your Orkut id) and that would greatly enhance their ability to mine useful and profitable data from the combination of your profile and daily searches.”

This is no conspiracy theory: I think he may be close to the truth of it.

Of course, dropping your stored cookies would be enough to break this, so it’s not clear it’s all that invasive or predatory.

[Posted with ecto]