I've been converting the encoding and markup of the GNOME Web sites in an effort to avoid doing some real work. We are moving the GNOME sites to UTF-8 and XHTML to catch up with standards. The Foundation and GUADEC sites are complete. The main portion of www.gnome.org is complete, the PHP portions of the w.g.o and the projects are my next tasks. If you maintain the pages for a project on w.g.o and you don't want me touching your content, send me an email. I'm fixing the developer Web site last because it has the most content, and the content is in a very unpredictable state.
We have a number of ways to generate Web content, some special to the Web, other common like DocBook and gtk-doc. We need to regenerate some of the content as XHTML. Older content that that isn't worth regenerating can be updated by hand, or we can label it as deprecated (or even historical).
Converting the character encoding isn't too difficult, but there are some inconveniences. Source code is stored in CVS in the encoding of the last developer or tool; some code in GNOME CVS is not UTF-8 and will not display right on cvs.g.o. This is a brief sample of invalid UTF-8:
./gnome-vfs/libgnomevfs/gnome-vfs-job-queue.c ./esound/esdplay.c ./libgnome/libgnome/gnome-config.h ./libgnomecanvas/libgnomecanvas/gnome-canvas-text.c ./libbonoboui/samples/compound-doc/container/container-io.h ./libgnomeui/libgnomeui/gnome-scores.c ./gnome-themes/gtk-themes/Mist/src/mist-style.c ./gnome-applets/battstat/acpi-linux.c ./gtkhtml2/libgtkhtml/gtkhtml.h