{"id":3067,"date":"2016-01-17T17:39:35","date_gmt":"2016-01-17T23:39:35","guid":{"rendered":"http:\/\/huewhite.com\/umb\/?p=3067"},"modified":"2016-01-17T17:39:35","modified_gmt":"2016-01-17T23:39:35","slug":"current-project-ctd-12","status":"publish","type":"post","link":"https:\/\/huewhite.com\/umb\/2016\/01\/17\/current-project-ctd-12\/","title":{"rendered":"Current Project, Ctd"},"content":{"rendered":"<p>The current saga of my\u00a0<a href=\"http:\/\/mythryl.org\/\" target=\"_blank\">Mythryl<\/a>\u00a0coding project, the last\u00a0entry with regards to\u00a0<a href=\"https:\/\/huewhite.com\/umb\/2015\/12\/23\/current-project-ctd-11\/\" target=\"_blank\">replacement text in DTDs<\/a> suggested I would take this approach:<\/p>\n<blockquote><p>The extSubset production will be handled via a callback function that can then build a new recursive descent parser that knows about parameter entity references and generate a new string with all the replacements accomplished. I have not made a decision about intSubset, or perhaps markupdecl \u2013 a solution is not readily apparent.<\/p><\/blockquote>\n<p>I ended up abandoning that approach &#8211; my first approaches tend to be a little Byzantine compared to the final approach, so it&#8217;s not surprising. \u00a0However, the final approach\u00a0(which is a bit of a grim thing to say in a programming project, really) is not much more appealing, as I decided to take the straightforward, but clumsy, approach of creating a new production rule:<\/p>\n<blockquote><p>pe_reference_replacement&#8217; = p_in 1000 &amp; |ws_i| &amp; is_this &#8220;%&#8221; &amp; name &amp; is_this &#8220;;&#8221; &amp; implement_replacement &amp; p_out 1000;<\/p>\n<p>pe_reference_replacement = &lt;(irrelevant_ws &amp; pe_reference_replacement&#8217;)&gt; ;<\/p><\/blockquote>\n<p>This is in <em>Mythryl<\/em> using the recursive descent parser&#8217;s operators. \u00a0 In the second production, the <em>&lt;&gt;<\/em> means 0 or more matching productions, while\u00a0<em>irrelevant_ws<\/em> simply means there may be discardable whitespace. Basically, there may be multiple appearances\u00a0of\u00a0<em>pe_reference_replacement&#8217;<\/em>, separated by optional whitespace.<\/p>\n<p>The first production uses the operator <em>||<\/em>, which indicates the contents are optional, while <em>ws_i<\/em> is a synonym for\u00a0<em>irrelevant_ws<\/em>, so this part is probably redundant (and if you&#8217;re thinking, on these two items, <em>My, he&#8217;s sloppy<\/em> &#8211; well, you win a prize!). \u00a0More interesting, there&#8217;s a check for a match to &#8216;%&#8217;, then a name, and then the required semi-colon &#8211; this is the definition of a parameter entity. \u00a0If all of these are matched, then\u00a0<em>implement_replacement<\/em> will be called to actually implement the replacement.<\/p>\n<p>Because\u00a0<em>implement_replacement<\/em> is highly dependent on my implementation back end, I&#8217;ll just say it puts the replacement text on the input and continues onward for more processing.<\/p>\n<p>References to\u00a0<em>pe_reference_replacement<\/em> are now sprinkled throughout the relevant productions. \u00a0An example, first from the spec, then the original Mythryl code, then the updated:<\/p>\n<blockquote><p>EntityDef ::= EntityValue | (ExternalID NDataDecl?)<\/p>\n<p>entity_def = p_in 73 &amp; (entity_value | (external_id &amp; |ndata_decl| )) &amp; seventy_three_p &amp; p_out 73;<\/p>\n<p>entity_def = p_in 73 &amp; |(pe_reference_replacement)| &amp; (entity_value | (external_id &amp; |ndata_decl| )) &amp; seventy_three_p &amp; p_out 73;<\/p><\/blockquote>\n<p>Subsidiary productions, where appropriate, also contain sprinklings of\u00a0<em>pe_reference_replacement<\/em> as optional accompaniments. \u00a0Is this a satisfactory approach? \u00a0Barely. \u00a0It obscures the the production&#8217;s purpose, and it&#8217;s prone to errors. \u00a0I do not look forward to searching out bugs. \u00a0But I was not able to find an approach with a better balance in the limited time I allot to this project.<\/p>\n<p>My first test of this comes from the <a href=\"http:\/\/www.w3.org\/TR\/REC-xml\/\" target=\"_blank\">W3<\/a> document itself, specifically <a href=\"http:\/\/www.w3.org\/TR\/REC-xml\/#sec-entexpand\" target=\"_blank\">Appendix D<\/a>, entitled &#8220;<em><strong>Expansion of Entity and Character References (Non-Normative)<\/strong><\/em>&#8220;. \u00a0The example is labeled as particularly difficult, and is as follows:<\/p>\n<blockquote><p>1 &lt;?xml version=&#8217;1.0&#8242;?&gt;<br \/>\n2 &lt;!DOCTYPE test [<br \/>\n3 &lt;!ELEMENT test (#PCDATA) &gt;<br \/>\n4 &lt;!ENTITY % xx &#8216;&amp;#37;zz;&#8217;&gt;<br \/>\n5 &lt;!ENTITY % zz &#8216;&amp;#60;!ENTITY tricky &#8220;error-prone&#8221; &gt;&#8217; &gt;<br \/>\n6 %xx;<br \/>\n7 ]&gt;<br \/>\n8 &lt;test&gt;This sample shows a &amp;tricky; method.&lt;\/test&gt;<\/p><\/blockquote>\n<p>(Ignore the line numbers.) \u00a0I can now parse this properly and report the expected results (which consist of all of the expected entity values, as well as the proper PCDATA for element test). \u00a0Yes, yes, this is hardly complete testing, but I&#8217;ve done enough damage today.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The current saga of my\u00a0Mythryl\u00a0coding project, the last\u00a0entry with regards to\u00a0replacement text in DTDs suggested I would take this approach: The extSubset production will be handled via a callback function that can then build a new recursive descent parser that knows about parameter entity references and generate a new string \u2026 <a class=\"continue-reading-link\" href=\"https:\/\/huewhite.com\/umb\/2016\/01\/17\/current-project-ctd-12\/\"> Continue reading <span class=\"meta-nav\">&rarr; <\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3067","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/posts\/3067","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/comments?post=3067"}],"version-history":[{"count":3,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/posts\/3067\/revisions"}],"predecessor-version":[{"id":3070,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/posts\/3067\/revisions\/3070"}],"wp:attachment":[{"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/media?parent=3067"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/categories?post=3067"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/huewhite.com\/umb\/wp-json\/wp\/v2\/tags?post=3067"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}