{"id":28771,"date":"2017-03-30T03:01:00","date_gmt":"2017-03-29T17:01:00","guid":{"rendered":"http:\/\/www.rjmprogramming.com.au\/ITblog\/?p=28771"},"modified":"2017-03-30T11:59:14","modified_gmt":"2017-03-30T01:59:14","slug":"spreadsheet-and-xml-global-substitution-genericization-tutorial","status":"publish","type":"post","link":"https:\/\/www.rjmprogramming.com.au\/ITblog\/spreadsheet-and-xml-global-substitution-genericization-tutorial\/","title":{"rendered":"Spreadsheet and XML Global Substitution Genericization Tutorial"},"content":{"rendered":"<div style=\"width: 230px\" class=\"wp-caption alignnone\"><a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping_more.jpg\"><img decoding=\"async\" style=\"border: 15px solid pink;\" alt=\"Spreadsheet and XML Global Substitution Genericization Tutorial\" src=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping_more.jpg\" title=\"Spreadsheet and XML Global Substitution Genericization Tutorial\"  style=\"float:left;\"  \/><\/a><p class=\"wp-caption-text\">Spreadsheet and XML Global Substitution Genericization Tutorial<\/p><\/div>\n<p>It&#8217;s one thing to write a useful one off web application with quite a few hard codings, but what about an attempt to genericize it, and by so doing, oftentimes you are improving its documentation aspects, so that, if the code is revisited years later &#8230;<\/p>\n<ul>\n<li>its generic qualities will be plain to all &#8230; and at the same time &#8230;<\/li>\n<li>it will be far easier to imagine as far as inputs are concerned &#8230;<\/li>\n<li>the user can (still) break the job up<\/li>\n<li>the user has less to worry about as far as a backup of data goes<\/li>\n<li>it does not feel like a one off any more<\/li>\n<li>it is less likely to be ill used for an inapplicable application<\/li>\n<\/ul>\n<p> &#8230; so that, all in all, we feel much more confident such code can last the test of time and usefulness into the future than yesterday&#8217;s (albeit useful) one off feeling version of the code you can see at <a title='Spreadsheet and XML Global Substitution CSV Tutorial' href='#saxgsct'>Spreadsheet and XML Global Substitution CSV Tutorial<\/a> as shown below.<\/p>\n<p>What&#8217;s the main driver of genericization, in our book (<font size=1>but not our pamphlettes<\/font>) for small jobs?<\/p>\n<ol>\n<li>turn all hardcodings you can into parameterizable variables, and today that is via PHP $_GET[] variables off the web browser address bar<\/li>\n<li>allow the user to change these, as the hardcodings just become defaults, and are presented in a submittable HTML form whose action is to recall the same piece of PHP software (code)<\/li>\n<\/ol>\n<p> &#8230; simple, huh?!<\/p>\n<p>Again, you can see the various aspects of this, in play, with today&#8217;s <a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping_more.jpg\" title=\"Tutorial picture\">tutorial picture<\/a>, and though it is not much use to run the PHP code live, its style is far more generic now, so we want to share <a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.php-GETME\" title=\"tr_mapping.php\">tr_mapping.php<\/a> (changed <a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/Geographicals\/diff.php?one=http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.php-GETME\" title=\"tr_mapping.php\">this way<\/a>) with you for your perusal &#8230; just in case (it is of use for you).<\/p>\n<hr>\n<p id='saxgsct'>Previous relevant <a target=_blank title='Spreadsheet and XML Global Substitution CSV Tutorial' href='https:\/\/www.rjmprogramming.com.au\/ITblog\/spreadsheet-and-xml-global-substitution-csv-tutorial\/'>Spreadsheet and XML Global Substitution CSV Tutorial<\/a> is shown below.<\/p>\n<div style=\"width: 230px\" class=\"wp-caption alignnone\"><a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.jpg\"><img decoding=\"async\" style=\"border: 15px solid pink;\" alt=\"Spreadsheet and XML Global Substitution CSV Tutorial\" src=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.jpg\" title=\"Spreadsheet and XML Global Substitution CSV Tutorial\"  style=\"float:left;\"  \/><\/a><p class=\"wp-caption-text\">Spreadsheet and XML Global Substitution CSV Tutorial<\/p><\/div>\n<p>Programmatically, we came in half way with the programming when we presented <a title='Spreadsheet and XML Global Substitution Primer Tutorial' href='#saxgspt'>Spreadsheet and XML Global Substitution Primer Tutorial<\/a> as shown below.  The programming, then, had two inputs, namely &#8230;<\/p>\n<ol>\n<li>input spreadsheet&#8217;s CSV file <i>manually created<\/i><\/li>\n<li>index XML file<\/li>\n<\/ol>\n<p> &#8230; but that CSV file can be <i>programmatically created<\/i> rather than <i>manually created<\/i>.  And while we&#8217;re at <i>programmatically creating<\/i> the CSV we could also <i>programmatically create<\/i> the Korn Shell (ksh) easier there too, with the same program, rather than using TextWrangler&#8217;s Grep (RegEx) talents &#8230; not that we&#8217;re ungrateful or anything &#8230; but it is good to mix things up to improve procedures sometimes.<\/p>\n<p>And what programming language can we use, and what environment for that programming code?  We think &#8230;<\/p>\n<ul>\n<li>coding wise, we&#8217;ll use PHP (starring PHP&#8217;s <a target=_blank title='PHP glob() method information' href='http:\/\/php.net\/manual\/en\/function.glob.php'>glob()<\/a> method) &#8230; and the environment for that will be that &#8230;<\/li>\n<li>we&#8217;ll use a (local Apache\/PHP\/MySql web server) <a target=_blank title='MAMP local web server' href='http:\/\/mamp.info'>MAMP<\/a> subfolder (ie. how desktop application &#8220;can meet&#8221; web application) off its Document Root (\/Applications\/MAMP\/htdocs\/) &#8230; \/Applications\/MAMP\/htdocs\/tr_mapping\/ &#8230; to store the XML data files (no CSV needed as input this way, as it will be <i>programmatically created<\/i> in part 1 of 2 parts to the whole job) &#8230; which becomes accessible in two ways &#8230;\n<ol>\n<li><i>http:\/\/localhost:8888\/tr_mapping\/tr_mapping.php<\/i>#in_a_web_browser<\/li>\n<li>at Mac OS X Terminal desktop application command line via &#8230; <br \/><i>cd \/Applications\/MAMP\/htdocs\/tr_mapping<\/i><br \/><i>ksh -x tr_mapping.ksh<\/i><\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<p>Again, email is the conduit for both sides of &#8230;<\/p>\n<ol>\n<li>input in<\/li>\n<li>output out<\/li>\n<\/ol>\n<p> &#8230; to complete proceedings.  You can see the various aspects of this, in play, with today&#8217;s <a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.jpg\" title=\"Tutorial picture\">tutorial picture<\/a>, and though it is not much use to run the PHP code live, its style is leaning towards the generic side enough for us to want to share <a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.php_GETME\" title=\"tr_mapping.php\">tr_mapping.php<\/a> with you for your perusal &#8230; just in case (it is of use for you).<\/p>\n<hr>\n<p id='saxgspt'>Previous relevant <a target=_blank title='Spreadsheet and XML Global Substitution Primer Tutorial' href='https:\/\/www.rjmprogramming.com.au\/ITblog\/spreadsheet-and-xml-global-substitution-primer-tutorial\/'>Spreadsheet and XML Global Substitution Primer Tutorial<\/a> is shown below.<\/p>\n<div style=\"width: 230px\" class=\"wp-caption alignnone\"><a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/XML\/spreadsheet_xml_job.pdf\"><img decoding=\"async\" style=\"border: 15px solid pink;\" alt=\"Spreadsheet and XML Global Substitution Primer Tutorial\" src=\"http:\/\/www.rjmprogramming.com.au\/XML\/spreadsheet_xml_job-224of.jpg\" title=\"Spreadsheet and XML Global Substitution Primer Tutorial\"  style=\"float:left;\"  \/><\/a><p class=\"wp-caption-text\">Spreadsheet and XML Global Substitution Primer Tutorial<\/p><\/div>\n<p>Yesterday when we were discussing <a target=_blank title='Worldbank API World Country Reporting Regex Tutorial' href='https:\/\/www.rjmprogramming.com.au\/ITblog\/worldbank-api-world-country-reporting-regex-tutorial\/'>Worldbank API World Country Reporting Regex Tutorial<\/a> we mentioned &#8230;<\/p>\n<blockquote>\n<p> &#8230; and we use <strike>it<\/strike> (ie. Regex) with serverside PHP today, under the auspices of the <a target=_blank title='PHP regex' href='http:\/\/php.net\/manual\/en\/function.preg-match.php'>preg_match<\/a> function, though we most often use RegEx thinking with the Javascript <a target=_blank title='Javascript string replace function information from w3schools' href='http:\/\/www.w3schools.com\/jsref\/jsref_replace.asp'><i>replace<\/i><\/a> function, as the way to make substitutions for more than one occurrence, (the one occurrence design being) a default &#8220;curiosity&#8221; (but can be useful too) about Javascript&#8217;s version of substitution.  You may know this RegEx usage of the Javascript replace function as &#8220;global substitution&#8221;.<\/p>\n<\/blockquote>\n<p> &#8230; and that term &#8220;global substitution&#8221;.  Many editing jobs, especially text file based ones, require or benefit from &#8220;global substitution&#8221; carefully applied, that is.  It is common to see an editor who shies away from &#8220;global substitution&#8221; methods, and in many cases that is wise, but &#8220;global substitution&#8221; gets good results when you &#8230;<\/p>\n<ul>\n<li>substitute things you know exist in the precise form you intend to search for, and only there, where you want to replace &#8230; to <\/li>\n<li>replacements should not feed back into the substitution list &#8230; doh &#8230; or you will end up with a confused unintended result<\/li>\n<\/ul>\n<p>In real life, it is often the case that the conditions above are easy to obey, because you are mapping an old numbering and\/or naming system to a completely new and dissimilar numbering and\/or naming system.  That&#8217;s the case in a little job we drilled down into, to show you what we did, that involved RegEx thoughts, to solve a problem.<\/p>\n<p>So, with our job we had &#8230;<\/p>\n<ul>\n<li>Aim: Change some XML in one file to have the text in one column of a Spreadsheet be mapped to the contents of another column of that same Spreadsheet<\/li>\n<li>Inputs: Excel Spreadsheet with those two columns as mentioned above and the one input XML file<\/li>\n<li>From the User: Asked for the user to send the Excel Spreadsheet &#8230; Saved As Comma Separated Values (CSV) in MS-DOS format and the one input XML file as two attached files in an Email<\/li>\n<li>Processing:\n<ol>\n<li>Opened Email with <a target=_blank title='Gmail' href='http:\/\/gmail.com'>Gmail<\/a> web application in Safari web browser desktop application, on a MacBook Pro laptop<\/li>\n<li>Downloaded the two Attachments and copied over to where we like to work &#8230; the home of <a target=_blank title='MAMP' href='http:\/\/mamp.info'>MAMP<\/a> local Apache\/PHP\/MySql web server &#8230; on a Mac OS X system is <i>\/Applications\/MAMP\/htdocs<\/i> (which we&#8217;ll access later with the Mac OS X <i>Terminal<\/i> desktop application later via &#8220;cd \/Applications\/MAMP\/htdocs&#8221;)<\/li>\n<li>Opened our favourite Text Editor desktop application, called <a target=_blank href='http:\/\/www.macupdate.com\/app\/mac\/11009\/textwrangler' title='Mac TextWrangler text editor'><i>TextWrangler<\/i><\/a>, whose &#8220;Find and Replace&#8221; &#8220;Grep&#8221; suboption will be a feature of today&#8217;s solution<\/li>\n<li>File -&gt; Open the Spreadsheet CSV file<\/li>\n<li>Search -&gt; Find&#8230; &#8230; Matching Mode: <a target=_blank title='Linux grep command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/ugrep.htm'>Grep<\/a> &#8230; Find: ^ Replace: #  &#8230; Replace All &#8230; remember our &#8220;RegEx&#8221; &#8220;cheat sheet&#8221; discussion (lots of which is relevant to TextWrangler <i>Matching Mode: Grep<\/i> as well) at that aforesaid mentioned tutorial &#8230;<br \/>\n<blockquote>\n<ul>\n<li>^ can mean &#8220;start of&#8221;<\/li>\n<li>$ can mean &#8220;end of&#8221;<\/li>\n<li>. can sometimes mean &#8220;one existant character wildcard&#8221; &#8230; or sometimes it is % or ? for this in other &#8220;systems&#8221;<\/li>\n<li>* can often mean &#8220;zero or more of preceding character wildcard&#8221;<\/li>\n<li>[] and () bracketing rules are pretty crucial for the more esoteric usages &#8230; also study | usage<\/li>\n<\/ul>\n<\/blockquote>\n<p>? &#8230; well, we want to start out mapping all lines to non-acting <a target=_blank title='korn' href='http:\/\/www.computerhope.com\/unix\/uksh.htm'>Korn Shell<\/a> command lines<\/li>\n<li>Typed as the new top line <i>#!\/bin\/ksh<\/i> &#8230; just for completeness sake &#8230; is optional step<\/li>\n<li>Search -&gt; Find&#8230; &#8230; No Matching Mode &#8230; Find: #,[ Replace: <a target=_blank title='Linux cat command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/ucat.htm'>cat<\/a> COMM.MIL~INDEX.xml | <a target=_blank title='Linux sed command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/used.htm'>sed<\/a> &#8216;\/\\[    &#8230; Replace All<\/li>\n<li>Search -&gt; Find&#8230; &#8230; Matching Mode: Grep &#8230; Find: ]$ Replace: \\\\]\/g&#8217; > x.xxx ; cat x.xxx > COMM.MIL~INDEX.xml ; <a target=_blank title='Linux rm command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/urm.htm'>rm<\/a> -f x.xxx    &#8230; Replace All<\/li>\n<li>File -&gt; Save As&#8230; fix_csv.ksh (to \/Applications\/MAMP\/htdocs directory)<\/li>\n<\/ol>\n<li>Opened Terminal desktop application that has a default <a target=_blank title='bash' href='http:\/\/www.computerhope.com\/unix\/ubash.htm'>Bash<\/a> environment (a lot like Linux, but is (giving you access to) a Mac OS X BSD operating system, really)<\/li>\n<li>Typed in: <i>cd \/Applications\/MAMP\/htdocs<\/i>  # to get to data<\/li>\n<li>Typed in: <i>cp COMM.MIL~INDEX.xml COMM.MIL~INDEX_original.xml<\/i>  # to backup data ahead of processing, as well as to compare file sizes with later, as a sanity check<\/li>\n<li>Typed in: <i>ksh -x fix_csv.ksh<\/i>  # access Korn Shell interpreter and run the TextWrangler created Korn Shell Script (and the -x <i>switch<\/i> tells the interpreter to be verbose with output reporting)<\/li>\n<li>Typed in: <i>ls -l COMM.MIL~INDEX*.xml<\/i>  # first sanity check verified files different, and not disastrously so &#8230; good first sign<\/li>\n<li>Typed in: <i>fgrep -c &#8216;[S1.12.4.20]&#8217; COMM.MIL~INDEX*.xml ; fgrep -c &#8216;[CCR.28E.20]&#8217; COMM.MIL~INDEX*.xml<\/i>  # second sanity check to prove old\/new parts of first\/last relevant Spreadsheet CSV file records were correctly mapped &#8230; and they were &#8230; so<\/li>\n<li>Opened Email with <a target=_blank title='Gmail' href='http:\/\/gmail.com'>Gmail<\/a> web application in Safari web browser desktop application (and used &#8220;Forward&#8221; option, attaching that new XML file), on a MacBook Pro laptop &#8230; so that &#8230;<\/li>\n<\/ol>\n<li>Output: One XML file with the global substitutions expressed in the Excel Spreadsheet performed, returned to User via Email &#8220;Forward&#8221; option, attaching that new XML file<\/li>\n<\/ul>\n<p>We hope you can see the good use you can make with Email and a good Text Editor and Linux type shell scripting, influenced by <a target=_blank title='Linux RegEx' href='https:\/\/www.digitalocean.com\/community\/tutorials\/using-grep-regular-expressions-to-search-for-text-patterns-in-linux'>RegEx<\/a> pattern matching regarding &#8230;<\/p>\n<ul>\n<li><a target=_blank title='Linux grep command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/ugrep.htm'>grep<\/a><\/li>\n<li><a target=_blank title='Linux sed command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/used.htm'>sed<\/a><\/li>\n<\/ul>\n<p>Which leaves us with today&#8217;s PDF slideshow of snapshots of making this job work, <a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/XML\/spreadsheet_xml_job.pdf\">here<\/a>.<\/p>\n<p>If this was interesting you may be interested in <a title='Click here to see topics in which you might be interested' href='#d28426' onclick='var dv=document.getElementById(\"d28426\"); dv.innerHTML = \"&lt;iframe width=670 height=600 src=\" + \"https:\/\/www.rjmprogramming.com.au\/ITblog\/tag\/regex\" + \"&gt;&lt;\/iframe&gt;\"; dv.style.display = \"block\";'>this<\/a> too.<\/p>\n<div id='d28426' style='display: none; border-left: 2px solid green; border-top: 2px solid green;'><\/div>\n<hr>\n<p>If this was interesting you may be interested in <a title='Click here to see topics in which you might be interested' href='#d28733' onclick='var dv=document.getElementById(\"d28733\"); dv.innerHTML = \"&lt;iframe width=670 height=600 src=\" + \"https:\/\/www.rjmprogramming.com.au\/ITblog\/tag\/csv\" + \"&gt;&lt;\/iframe&gt;\"; dv.style.display = \"block\";'>this<\/a> too.<\/p>\n<div id='d28733' style='display: none; border-left: 2px solid green; border-top: 2px solid green;'><\/div>\n<hr>\n<p>If this was interesting you may be interested in <a title='Click here to see topics in which you might be interested' href='#d28771' onclick='var dv=document.getElementById(\"d28771\"); dv.innerHTML = \"&lt;iframe width=670 height=600 src=\" + \"https:\/\/www.rjmprogramming.com.au\/ITblog\/tag\/form\" + \"&gt;&lt;\/iframe&gt;\"; dv.style.display = \"block\";'>this<\/a> too.<\/p>\n<div id='d28771' style='display: none; border-left: 2px solid green; border-top: 2px solid green;'><\/div>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s one thing to write a useful one off web application with quite a few hard codings, but what about an attempt to genericize it, and by so doing, oftentimes you are improving its documentation aspects, so that, if the &hellip; <a href=\"https:\/\/www.rjmprogramming.com.au\/ITblog\/spreadsheet-and-xml-global-substitution-genericization-tutorial\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,29,37],"tags":[85,111,234,283,319,320,1750,380,1839,1615,2142,503,544,677,707,714,725,744,913,932,1040,1053,1105,1115,1139,1185,1867,1252,1255,1265,1480],"class_list":["post-28771","post","type-post","status-publish","format-standard","hentry","category-elearning","category-operating-system","category-tutorials","tag-apache","tag-attachment","tag-command-line","tag-csv","tag-desktop","tag-desktop-application","tag-document-root","tag-email","tag-genericization","tag-glob","tag-global-substitution","tag-gmail","tag-grep","tag-korn-shell","tag-linux","tag-local-web-server","tag-mac-os-x","tag-mamp","tag-pdf","tag-php","tag-regex","tag-replace","tag-script","tag-sed","tag-shell-script","tag-spreadsheet","tag-substitution","tag-terminal","tag-text-editor","tag-textwrangler","tag-xml"],"_links":{"self":[{"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/posts\/28771"}],"collection":[{"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/comments?post=28771"}],"version-history":[{"count":3,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/posts\/28771\/revisions"}],"predecessor-version":[{"id":29228,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/posts\/28771\/revisions\/29228"}],"wp:attachment":[{"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/media?parent=28771"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/categories?post=28771"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/tags?post=28771"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}