Managed HTML rendering

At some point in any WinForms project, you're bound to need either:

Although I am ambivalent towards HTML, there's no question that it is a far, far better solution than the nasty, crusty old Rich Text Format. RTF is HTML gone stupid. If you're ever bored and want to take on a brain-meltingly difficult project, just try writing a RTF to HTML converter. Oh sure, it seems easy enough.. but I don't think anyone can appreciate how profoundly irrational RTF is until they actually sit down and work with it in detail. Ugly doesn't begin to cover it. Based on my limited research, RTF seems to have evolved as the de facto document storage format for early versions of Microsoft Word, apparently based on the whims of whatever development team was working on Word that week.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2004/10/managed-html-rendering.html

My luck Microsoft’s version would be some FrontPage derivative that generates/reads very sloppy HTML. I’d tell em not to bother making it if they were going to do that.

HTML rendering isn’t too easy, but getting it down to just rendering might drop the size a little bit. Much of the interop size is COM stuff, networking, and a slight bit of useless code that doesn’t really deal with HTML rendering.

If I were to go about it, I would probably use XML/XSLT to render HTML content. It’s included in the framework and “free” as the RtfTextBox. I’d then use DTD or XSD files to layout the various HTML, etc definitions so that you wouldn’t have to recompile your control everytime a new HTML spec came out. Networking should be left out of the control so that you only pass the raw information. That would keep it as lightweight as possible because you don’t really have to handle the HTTP protocol stuff within the control itself. There’s other portions of .NET that can handle that relatively easily.

I’m speculating here and it’s late, so I reserve the right to say whatever I said probably sucks. It’s an interesting idea but I have no clue how well it’d work.

Maybe it’s just me, but I’ve found most HTML-rendering “rich” text editing controls to be irrational to use – they just have a flippin’ mind of their own, which comes down to the fact that you just never know when you hit Enter whether the thing thinks you want a div or a p or a br or whatever. At least, if Outlook HTML editing and Outlook Express are exemplars of what wrapping MSHTML.dll is like.

So, a few things:

  1. I don’t believe that Lutz and Nikhil’s code requires an interop DLL. I think they just use COM interop straight up in their own code. I could be wrong there, but the download of Writer is only a few K, not several MB.
  2. Pasting bullets from Word into RichTextEditor seems to work fine for me. I didn’t do anything special in RichTextEditor to enable this.
  3. Like you, HTML interests me much more as a format than RTF. However, the model I’m particularly interested in is the one where the user interacts with the control purely as text, and the application determines formatting. I think of this as the “IDE model”.

Overall I agree with you; having an HTML control would make life much easier, as long as it allows interactive editing.

I Can’t speak for Lutz’s control, I haven’t looked at it in a long time. Nikhil’s control, however, uses P/Invoke on mshtml. Good stuff.

I wonder if mono is doing anything interesting in this space? There’s also the gecko engine to think about - but as you said, rendering html is not a trivial task.

Lutz’s control uses Nikhil’s interop wrapper.

The only good thing about the IE PIA is that it somehow compresses (standard zip compression) from 7.8 to less than 1.5MB… I’m not sure if there are a lot of strings in there, or if it’s just random chance.

I guess at this point we’ll have to wait for (at least) Longhorn to get a really truly managed HTML renderer. Sad, really.

Jeff,

Have you seen this commercial control: a href="http://www.netrixcomponent.net?"http://www.netrixcomponent.net?/a

It’s similar in concept to Lutz and Nikhil’s code, (no PIA) but seems much more extensive in the features it supports.

I can’t find much on-line comment about it tho…

PS. Thanks for this handy page!

Ok. Here we are in October 2006. Guess what… there still isn’t a good, lightweight tool to do this yet.

I checked out the Netrixcomponent link… too expensive for a single developer in a minimal use app.

I looked into this recently as well and that WebBrowser control is trouble (or IE-interop prior to .Net 2.0). When you interface with it it’s all-or-nothing - you get that progress bar displaying even if what you intend to display, while HTML, is nothing like a webpage (just fancy formatted text), and you get every single security bug in IE imported right into your app - scripting and all.

Basically, the WebBrowser control is another cheap hack on Microsoft’s part to, for whatever reason, further avoid using their own Managed code to finally write a browser in .Net. Beats me as to why they keep putting it off - afterall, if they did finally get on their own horse, we’d likely have something as componentized as other parts of .Net - where you can pick and choose the pieces you need (say just HTML functionality, or just scripting, etc) and leave out the other parts that are just performance burdens, UI burdens, and above all, security burdens.

Looks like open source is the way to solve this one. Majestic 12 proposes a solution:
http://www.majestic12.co.uk/projects/html_parser.php

I’m going to give that a shot and if it works nicely and I can add to it a bit I’ll ask the guy to get a SourceForge/Subversion tree going.

If you’re after a good wysiwyg xhtml editor, I’ve had great sucess using XStandard: a href="http://www.xstandard.com/"http://www.xstandard.com//a

It goes a long way towards helping your users generate “semantically correct” data documents as things like bold and italics are controlled through a flexible style menu. It also avoids depreciated tags like b and instead uses the more meaningful ones like strong. Another big advantage if your applications are focused on creating content for the web is the fact that you can supply it with a stylesheet to use in its rendering that gives users a real time preview of what it will look like on the site without adding extra weight to the html.

I’m interested to hear your opinion of the tool! I’ve even considered using it as an ‘advanced’ text editing control in some of my web applications.

There is finally a fully managed HTML rendering component for use in Windows Forms.

Gobicode’s HTMLLabel control can be used to display HTML formatted text and images. The control supports super/sub-scripts, lists, in-line images, text justification and much more.
A list of supported HTML tags can be found here: http://www.gobicode.com/htmltags.php

Being a fully managed .NET control, the HTMLLabel has many advantages over the RichTextBox and WebBrowser COM wrappers, the major benefit being that the HTMLLabel can render to any graphics surface, this allows HTML formatted content to be displayed within grids, charts, list-boxes and any other control which allows custom painting.

You have the choice of GDI and GDI+ rendering, as well as a few pretty extras such as gradient fills and blur effects.

Examples with screenshots can be seen here: http://www.gobicode.com/features.php

If you get a chance to try it out then I’d be interested to hear your thoughts, I can be contacted at warwick[no-spam-at]gobicode.com

This is a nice start for a good managed HTML Renderer:
http://www.menendezpoo.com/a.php?h=a4912174e56416

Seems that people are still commenting on this issue, so I’ll toss in a link. Someone wrote a C# HTML renderer back in 2003 for the Compact Framework, and reading this page reminded me of it. The old homepage is at http://home.nc.rr.com/bshankle/cfhtml/index.html , although the developer has since relocated the file to his own site at http://www.bruceshankle.com/_mgxroot/page_10764.html . It’s effectively open-source, and while it’s not a complete HTML implementation, it looks like it covers enough to be useful.

You said, “What I’d really like to see is a completely managed, lightweight HTML rendering control written entirely in .NET.”

I wanted that control too, and developed and released it: see http://www.modeltext.com/html/

It was quite a bit of work to finish: but I like the result. It’s described in some detail on its web site, but to summarise it here:

  • 100% managed code, no dependencies
  • XHTML rendering and WYSIWYG editing
  • Rendering supports (a subset of) CSS
  • Also supports HTML form elements

Benefits include:

  • No dependencies, e.g. on browsers
  • Ultra-clean (XHTML) output
  • Fast
  • Built-in support for editing (e.g. supports undo and redo)
  • .NET-compatible (instead of JavaScript) DOM Node and DOM Event APIs

There are several use cases for it, the main one being for any application which wants to let the end-user view and edit formatted text.

That sounds like a simple-enough requirement and I was surprised that, before I wrote this one, I didn’t find a control that I liked which implements this functionality. Without it, interactive applications use unformatted edit boxes, and grids: which I don’t find elegant or user friendly for editing several paragraphs of text, whose users might want to format some headings, lists, hyperlinks, and the occasional table.