Office 2003/XP Add-in: Remove Hidden Data

With this add-in you can permanently remove hidden and collaboration data, such as change tracking and comments, from Word 2003/XP, Excel 2003/XP, and PowerPoint 2003/XP files. 
via [Early Adopter Weblog]

Everyone needs this add-in. In the past 6 months, I’ve received 3 Press Releases with embarassing internal comments and 1 document whose author had gone to significant pains to remain anonymous but who forgot or didn’t understand Word’s document properties fields.


Finalization is expensive.  It has the following costs…<snip of 2, 930 additional words>… Subject to all of the above, we guarantee that we will dequeue your object and initiate a call to the Finalize method.  We do not guarantee that your Finalize method can be JITted without running out of stack or memory.  We do not guarantee that the execution of your Finalize method will complete without being aborted.  We do not guarantee that any types you require can be loaded and have their .cctors run.  All you get is a “best effort” attempt. 

[cbrumme’s WebLog]

I’m becoming more and more convinced that the guideline for non-deterministic finalization should be: avoid it. The only thing you need finalization for are non-memory resources: file handles, database connections, etc. You can either exercise a little diligence so as to manually track and clean up these things or you can subject yourself to the complicated algorithms described by Brumme. If I were managing a programming team, there’s no way that I’d trust junior programmers to read Brumme’s post and then tell them “Okay, make sure we don’t run out of database handles.” Nuh-uh. I’d say: “Refactor your non-memory resources into xPool objects and deterministically finalize them using try…finally or using statements and the IDisposable pattern.” Learning that might be more complex than reading Brumme’s post, but the results are much easier to review and duplicate.


The people who really get a huge benefit from non-deterministic finalization are library writers, who finally have a guarantee that sometime before the process ends, they’ll get a final shot at cleaning up resources. Even they, I think, should strive for explicit deterministic finalization and use the finalizer as a “last chance” effort to trigger clean-up.

Spam Will Lead To Artificial Intelligence

A recent CACM has an article on CAPTCHAs (those visual problems used to defeat registration ‘bots). The article made the point that solving a CAPTCHA requires advancing the state-of-the-art of artificial intelligence and image recognition. Similarly, the spammers have begun using Markov chains to fool spam tools that work at the level of the single word. Naturally, the next step for anti-spam will be to apply rudimentary grammatical analysis to the body text, diagram the sentence like a 9th grader, and do Bayesian analysis of the sentence structure. Naturally, the spammers will counter with more sophisticated sentence generators, the anti-spammers will improve their contextual analysis, and the two forces, driven by the ridiculous economics of spam, will co-evolve machine intelligence.

Eventually, the machine intelligence will send a robotic assassin back in time to kill Thomas Bayes and ensure a future in which all humans have low-interest loans, college diplomas, and an herbally-enhanced sex life. Chilling.

RSS Sweep: Interesting Links, No Editorial

If you follow the rumor mill, you may have heard of X# or “Xen”, the crazy next-generation programming language that was reportedly being cooked up by people on my former team.  I won’t say what they all are working on now, but some more of them have started blogs….

I just finished reading Cory Doctorow’s latest book Eastern Standard Tribe. It was a cool book and the next time I hit a bookstore or I will buy a copy. Because Cory released the book under a Creative Commons license people have been transforming it into a variety of formats. The one that caught my eye was “speed-reader” by Trevor Smith. This is a Java applet that flashes the book up on the screen a word at a time. The single user-interface control it has is for varying the speed at which the words are presented….

Harald Leitenm