{"id":1233,"date":"2015-06-06T14:32:09","date_gmt":"2015-06-06T19:32:09","guid":{"rendered":"http:\/\/huewhite.com\/umb\/?p=1233"},"modified":"2015-06-06T14:32:09","modified_gmt":"2015-06-06T19:32:09","slug":"current-project-ctd-5","status":"publish","type":"post","link":"https:\/\/huewhite.com\/umb\/2015\/06\/06\/current-project-ctd-5\/","title":{"rendered":"Current Project, Ctd"},"content":{"rendered":"<p>Continuing my faintly <a href=\"https:\/\/huewhite.com\/umb\/2015\/06\/02\/current-project-ctd-4\/\" target=\"_blank\">ridiculous hobby<\/a>, I thought I&#8217;d add a note on XML validity constraints and how they interact with the EBNF of XML: which is to say, they don&#8217;t.<\/p>\n<p>Which is reasonable, of course: the point of the EBNF is to specify the syntax, not the semantics.\u00a0 But this does make validity checking a little more difficult.\u00a0 Here&#8217;s the <a href=\"http:\/\/www.w3.org\/TR\/REC-xml\/#sec-prolog-dtd\" target=\"_blank\">first example of a constraint<\/a>:<\/p>\n<blockquote><p><b>Validity constraint: Root Element Type<\/b><\/p>\n<p>The <a href=\"http:\/\/www.w3.org\/TR\/REC-xml\/#NT-Name\">Name<\/a> in the document type declaration <em class=\"rfc2119\" title=\"Keyword in RFC 2119 context\">MUST<\/em> match the element type of the <a title=\"Root Element\" href=\"http:\/\/www.w3.org\/TR\/REC-xml\/#dt-root\">root element<\/a>.<\/p><\/blockquote>\n<p>And an illustration:<\/p>\n<blockquote>\n<pre>&lt;?xml version=\"1.0\"?&gt;\r\n&lt;!DOCTYPE greeting SYSTEM \"hello.dtd\"&gt;\r\n&lt;greeting&gt;Hello, world!&lt;\/greeting&gt; \r\n<\/pre>\n<\/blockquote>\n<p>Note the word &#8220;greeting&#8221; in the !DOCTYPE and in the XML element.\u00a0\u00a0 So what does Production 1 look like?<\/p>\n<blockquote><p>document ::= prolog element Misc*<\/p><\/blockquote>\n<p>element is the important piece here:<\/p>\n<blockquote><p>element ::= EmptyElemTag | STag content ETag<\/p><\/blockquote>\n<p>Both EmptyElemTag and STag reference Production 5:<\/p>\n<blockquote><p>Name ::= NameStartChar (NameChar)*<\/p><\/blockquote>\n<p>And it&#8217;s this Name which we&#8217;ll match against the DTD&#8217;s name.\u00a0 But, of course, Name is used in many other places, which I&#8217;ll omit listing here, so clearly we cannot modify the processing at Name to validate against the DTD; or, more accurately, a hacker might find some way to get there, but that&#8217;s not really good enough.<\/p>\n<p>More importantly, the element Production is also referenced from another location, namely Production 43, and again it would be inappropriate to modify the element Production to validate a special case.\u00a0 So, how to handle this, and potentially other special cases, and properly highlight the purpose of the modification to the EBNF?<\/p>\n<p>I hit upon using the previously described <a href=\"https:\/\/huewhite.com\/umb\/2015\/05\/09\/current-project-ctd-3\/\" target=\"_blank\">debug mechanism<\/a>.\u00a0 First, I defined the notion of a Production Post Processor:<\/p>\n<p>Post_Production_Processor = Handlers -&gt; List(XML_OnePass) -&gt; Dtd -&gt; Int -&gt; Int -&gt; (Handlers, Dtd);<\/p>\n<p>In English, a production post processor accepts all the SAX handlers provided by the user, the current state of processing, the DTD, and (for convenience) the current line and column numbers; it returns potentially new versions of the handlers and the DTD.\u00a0 (Yes, I&#8217;m wondering if returning a new DTD is pointless.)<\/p>\n<p>I added an integer (name of Production) map to Post_Production_Processor map to the internal state of the SAX parser.\u00a0 This is updated by a new function, <em>validity_constraint<\/em>, which functions as a parser, accepting a production Name and Post_Production_Processor and modifying the previously mentioned map with the information before executing the <em>success<\/em> function.<\/p>\n<p>As you might have guessed, then, I&#8217;ve modified the <em>p_out<\/em> function for a match between its <em>name<\/em> argument and anything in the map from name to Post_Production_Processor.\u00a0 If one is found, the entry is deleted, and the post processor is executed.\u00a0 The <em>success<\/em> function is then invoked, but using the results of the post processor rather than those passed in.<\/p>\n<p>Usage?<\/p>\n<blockquote><p>vc = validity_constraint;\u00a0\u00a0\u00a0 # purely for readability<\/p>\n<p>&#8230;<\/p>\n<p>xml_doc = p_in 1 &amp; start_doc &amp; prolog &amp; vc 5 validate_dtd_name &amp; element &amp; &lt;misc&gt; &amp; one_p &amp; p_out 1;<\/p><\/blockquote>\n<p><em>validate_dtd_name<\/em> is invoked when Production 5 terminates, and I&#8217;ve determined that the next Production 5 will ALWAYS be the outermost element&#8217;s name.<\/p>\n<p>For comparison, here&#8217;s the original EBNF of Production 1:<\/p>\n<blockquote><p>document ::= prolog element Misc*<\/p><\/blockquote>\n<p>The added implementation elements are p_in, start_doc, vc, one_p, and p_out.\u00a0 The transformation is rather large, but still straightforward:<\/p>\n<p><em>p_in<\/em> is the debug (and now validity constraint) mechanism;<\/p>\n<p><em>start_doc<\/em> implements the content handler&#8217;s <a href=\"http:\/\/xerces.apache.org\/xerces-c\/apiDocs-3\/classContentHandler.html#aeef6a75cf8b819dd9c932900bc75c0fa\" target=\"_blank\"><em>start_document<\/em><\/a> functionality;<\/p>\n<p><em>vc<\/em>, as discussed;<\/p>\n<p><em>one_p,<\/em> the processor specific to Production 1 &#8211; not all Productions have processors, but this one does (it returns the handlers to the caller);<\/p>\n<p><em>p_out<\/em>, as discussed.<\/p>\n<p>Most productions only have <em>p_in<\/em> and <em>p_out<\/em> additions.\u00a0 A few have processors.\u00a0 So far, only Production 1 has a post processor, for which I should probably come up with a better name: validity processor, perhaps.\u00a0 Let me know if you have a better name.<\/p>\n<p>I should change <em>xml_doc<\/em> to <em>document<\/em>, just for consistency.\u00a0 (Consistency is rarely of interest to me, sadly.)<\/p>\n<p>Testing of this mechanism has yielded positive results, in that I can see the processor invoked.\u00a0 I have to modify one of my tests to have a DTD in order to really test it, and I haven&#8217;t gotten that far; if a DTD is not defined, the requirement is ignored.<\/p>\n<p>I look forward to trying to implement other validity constraints with this mechanism.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Continuing my faintly ridiculous hobby, I thought I&#8217;d add a note on XML validity constraints and how they interact with the EBNF of XML: which is to say, they don&#8217;t. Which is reasonable, of course: the point of the EBNF is to specify the syntax, not the semantics.\u00a0 But this \u2026 <a class=\"continue-reading-link\" href=\"https:\/\/huewhite.com\/umb\/2015\/06\/06\/current-project-ctd-5\/\"> Continue reading <span class=\"meta-nav\">&rarr; <\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1233","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/posts\/1233","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/comments?post=1233"}],"version-history":[{"count":3,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/posts\/1233\/revisions"}],"predecessor-version":[{"id":1236,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/posts\/1233\/revisions\/1236"}],"wp:attachment":[{"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/media?parent=1233"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/categories?post=1233"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/tags?post=1233"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}