Of course, since I’m such a klutz at the keyboard, I found a bug when I typed the file name wrong. There should be a return after line 16: Console.WriteLine(“File doesn’t exist”); to prevent the user seeing the exception get thrown.
I know Windows is not big on drag-and-drop, but is there a drag-and-drop version of this in the works so we can just drag a folder full of mucked up Word HTML docs onto the program and have it clean them all in batch?
My solution is simpler: If it’s going to wind up as HTML eventually, don’t write it in Microsoft Word. Use the free OpenOffice.org Writer, the open source derivative of Sun Microsystems’ Star Office Writer. It can compose documents with all the styles and nice appearance of Word documents, but when it exports the document as HTML, the resulting code is much, much cleaner. Any styles used in the document are included as HTML inline style blocks. It can also import Word documents, although I have no experience using it as a filter to clean up Word files prior to export to HTML. OpenOffice.org is a free suite similar to Microsoft Office, and is available for Microsoft Windows, Apple Macintosh and Linux systems. (Up to version 2.4 it also ran on Windows 98 systems, and you can probably still find the installer in archives.)