Output Error String Is Not In Utf-8
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question x Dismiss Join the Stack Overflow Community Stack Overflow is a community of 6.2 million programmers, just like you, helping each other. Join them; it only takes a minute: Sign up Data not getting added to the xml node? Getting error: output error : invalid character value up vote 0 down vote favorite I am trying to build and xml and trying to add an big chunk of string to an xml node. I am getting three errors very regularily(not all three errors everytime). Here are the three errors: output error : string is not in UTF-8 xml escape entities char out of range output error : invalid character value In the case of first two errors the data gets added to the xml node even when the error gets displayed. But when I get the third error that is output error : invalid character value the string data is not getting added to the xml node. Do not know where is it going wrong. Can someone direct me to solve the above issue? I am using libxml2 library and in C platform c libxml2 share|improve this question asked Jan 30 '14 at 13:51 Kranthi Kumar 532417 add a comment| 1 Answer 1 active oldest votes up vote 2 down vote The error string is not in UTF-8 should be self-explanatory. libxml2 expects all input strings (xmlChar *) to be encoded in UTF-8. The error xmlEscapeEntities : char out of range occurs if you add ASCII control chars that are not allowed in XML 1.0 which includes all ASCII characters from 0 to 31 except tab, newline and carriage return (0x09, 0x0A, 0x0D). The error invalid character value can occur for all kinds of characters that are not allowed in XML 1.0. For example, ASCII control chars, Unicode surrogates or other invalid Unicode code points. So you're adding strings with invalid UTF-8, invalid
Re: [xml] xmlSetProp reports error - "error : string is not in UTF-8" for a URL ! Date: Wed, 11 Mar 2009 23:01:17 -0700 (PDT) Hi, Prashant R wrote: Hi , This is using C++/ gcc on LIBXML 2.7.2 I am trying to add an attribute to a node , that raises an error "error : string is not in UTF-8" I am using the API xmlSetProp(currentNode , (const xmlChar *) kAttribName , (const xmlChar *)" http://www.w3.org/2000/09/xmldsig#")) Looking at the stack trace , the error originates from xmlNewPropInternal( ..) where xmlCheckUTF8(value) returns 0 I am baffled as to why xmlCheckUTF8 would fail when passing this string - " http://www.w3.org/2000/09/xmldsig#" Basically , inside the for loop the first if statement is encountered (if ((c & 0x80) == 0x00) There http://stackoverflow.com/questions/21458377/data-not-getting-added-to-the-xml-node-getting-error-output-error-invalid-ch isn't a check for NULL termination due to which it even passes the NULL characters at the end of the string and then grabs garbage and ultimately returns 0 . I am baffled as to why you think there is no check for a NULL character termination. int xmlCheckUTF8(const unsigned char *utf) { int ix; unsigned char c; if (utf == NULL) return(0); /* * utf is a string of 1, 2, 3 or 4 bytes. The valid strings * are as follows (in "bit format"): https://mail.gnome.org/archives/xml/2009-March/msg00025.html * 0xxxxxxx valid 1-byte * 110xxxxx 10xxxxxx valid 2-byte * 1110xxxx 10xxxxxx 10xxxxxx valid 3-byte * 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx valid 4-byte */ for (ix = 0;;) { /* string is 0-terminated */ c = utf[ix]; No, that line (in the issued source for at least 5 years) has been for (ix = 0; (c = utf[ix]);) { Why is yours different???? if ((c & 0x80) == 0x00) { /* 1-byte code, starts with 10 */ ix++; } else if ((c & 0xe0) == 0xc0) {/* 2-byte code, starts with 110 */ if ((utf[ix+1] & 0xc0 ) != 0x80) return 0; ix += 2; } else if ((c & 0xf0) == 0xe0) {/* 3-byte code, starts with 1110 */ if (((utf[ix+1] & 0xc0) != 0x80) || ((utf[ix+2] & 0xc0) != 0x80)) return 0; ix += 3; } else if ((c & 0xf8) == 0xf0) {/* 4-byte code, starts with 11110 */ if (((utf[ix+1] & 0xc0) != 0x80) || ((utf[ix+2] & 0xc0) != 0x80) || ((utf[ix+3] & 0xc0) != 0x80)) return 0; ix += 4; } else /* unknown encoding */ return 0; } return(1); } Am I missing something very fundamental here ? Thanks _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml gnome org http://mail.gnome.org/mailman/listinfo/xml Bill References: [xml] xmlSetProp reports error - "error : string is not in UTF-8" for a URL ! From: Prashant R [Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index] The GNOME Project About Us Get Involved Teams Support GNOME Contact Us The GNOME Foundation Resour
xmlsecFTPWindows binariesSolaris binariesMacOsX binarieslxml Python bindingsPerl bindingsC++ bindingsPHP bindingsPascal bindingsRuby bindingsTcl bindingsBug TrackerxmlschemastypesAPI documentationThe XML C parser and toolkit of Gnomexmlunicodetype and interfaces needed for the internal string handling of the library, especially UTF8 processing. Table of Contents#define BAD_CASTTypedef unsigned char xmlChar xmlChar * xmlCharStrdup (const char * cur) xmlChar * xmlCharStrndup (const http://xmlsoft.org/html/libxml-xmlstring.html char * cur, int len) int xmlCheckUTF8 (const unsigned char * utf) int https://bytes.com/topic/perl/answers/50726-libxml-utf8-input-not-proper-utf-8-indicate-encoding xmlGetUTF8Char (const unsigned char * utf, int * len) int xmlStrEqual (const xmlChar * str1, const xmlChar * str2) int xmlStrPrintf (xmlChar * buf, int len, const char * msg, ... ...) int xmlStrQEqual (const xmlChar * pref, const xmlChar * name, const xmlChar * str) int xmlStrVPrintf (xmlChar * buf, int len, const char * output error msg, va_list ap) int xmlStrcasecmp (const xmlChar * str1, const xmlChar * str2) const xmlChar * xmlStrcasestr (const xmlChar * str, const xmlChar * val) xmlChar * xmlStrcat (xmlChar * cur, const xmlChar * add) const xmlChar * xmlStrchr (const xmlChar * str, xmlChar val) int xmlStrcmp (const xmlChar * str1, const xmlChar * str2) xmlChar * xmlStrdup (const xmlChar * cur) int xmlStrlen (const xmlChar * str) int xmlStrncasecmp output error string (const xmlChar * str1, const xmlChar * str2, int len) xmlChar * xmlStrncat (xmlChar * cur, const xmlChar * add, int len) xmlChar * xmlStrncatNew (const xmlChar * str1, const xmlChar * str2, int len) int xmlStrncmp (const xmlChar * str1, const xmlChar * str2, int len) xmlChar * xmlStrndup (const xmlChar * cur, int len) const xmlChar * xmlStrstr (const xmlChar * str, const xmlChar * val) xmlChar * xmlStrsub (const xmlChar * str, int start, int len) int xmlUTF8Charcmp (const xmlChar * utf1, const xmlChar * utf2) int xmlUTF8Size (const xmlChar * utf) int xmlUTF8Strlen (const xmlChar * utf) int xmlUTF8Strloc (const xmlChar * utf, const xmlChar * utfchar) xmlChar * xmlUTF8Strndup (const xmlChar * utf, int len) const xmlChar * xmlUTF8Strpos (const xmlChar * utf, int pos) int xmlUTF8Strsize (const xmlChar * utf, int len) xmlChar * xmlUTF8Strsub (const xmlChar * utf, int start, int len) Description Macro: BAD_CAST#define BAD_CASTMacro to cast a string to an xmlChar * when one know its safe. This is a basic byte in an UTF-8 encoded string. It's unsigned allowing to pinpoint case where char * are assigned to xmlChar * (possibly making serialization back impossible). Function: xmlCharStrdupxmlChar * xmlCharStrdup (const char * cur) a strdup for cha
Need help? Post your question and get tips & solutions from a community of 418,616 IT Pros & Developers. It's quick & easy. LibXML UTF8 - Input is not proper UTF-8, indicate encoding ! P: n/a Vlajko Knezic Not so sure what is going on here but is something to do with the way UTF8 is handled in Perl and/or LibXML The sctript below: - accepts a value from a form text field; - builds XML document around it, - deparses the document to the string using toString(), - parses the string into the XML document using parse_string() - transforms XML document into HTML document using XSL transformation Everything works well until UTF8 character is entered in the text field (for example é) . In that case when trying to run parse_string() code crashes with the message: ================================================== =================== :2: parser error : Input is not proper UTF-8, indicate encoding !