Wednesday, May 19, 2010

XML Validation error: Data at the root level is invalid. - could be a UTF BOM issue

UTF has an optional Byte Order Marker at the beginning of the file. The UTF encoding classes included in the .NET Frameworks previous to 4.0 by default did not write this BOM. Hence, most applications we wrote to read UTF files does not handle the BOM. :)


I recently wrote a tool using Visual Studio 2010  to generate a UTF8 encoded XML file to be consumed by another application which read the XML file using PowerShell. That app failed with an error "XML Validation error: Data at the root level is invalid. Line 1, position 1." Of cause, opening the file in Notepad2 didn't show any suspicious characters. Some educated colleagues pointed us at the possibility of BOM; which proved to be the case.


The fix is relatively easy. The UTF encoding class constructors takes in a boolean parameter encoderShouldEmitUTF8Identifier. Just pass false in to that.




        //
        // Summary:
        //     Initializes a new instance of the System.Text.UTF8Encoding class. A parameter
        //     specifies whether to provide a Unicode byte order mark.
        //
        // Parameters:
        //   encoderShouldEmitUTF8Identifier:
        //     true to specify that a Unicode byte order mark is provided; otherwise, false.
        public UTF8Encoding(bool encoderShouldEmitUTF8Identifier); 


Example using a XML writer:



XmlTextWriter writer = new XmlTextWriter(exportPath, new UTF8Encoding(false));

2 comments:

  1. As it turns out, Notepad adds a BOM when you save a file with UTF-8 encoding - http://blogs.msdn.com/michkap/archive/2010/02/23/9967789.aspx Also, go through the comments to see some "go fix your parser" type opinions :-)

    ReplyDelete
  2. :P The sarcasm in that post is really good...

    Of cause, parsers should be fixed; but if you are installing sharepoint with FAST Search; beaware some parts of it are not fixed ;)

    ReplyDelete