{"id":28733,"date":"2017-03-09T03:01:04","date_gmt":"2017-03-08T17:01:04","guid":{"rendered":"http:\/\/www.rjmprogramming.com.au\/ITblog\/?p=28733"},"modified":"2017-03-08T06:20:52","modified_gmt":"2017-03-07T20:20:52","slug":"spreadsheet-and-xml-global-substitution-csv-tutorial","status":"publish","type":"post","link":"https:\/\/www.rjmprogramming.com.au\/ITblog\/spreadsheet-and-xml-global-substitution-csv-tutorial\/","title":{"rendered":"Spreadsheet and XML Global Substitution CSV Tutorial"},"content":{"rendered":"<div style=\"width: 230px\" class=\"wp-caption alignnone\"><a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.jpg\"><img decoding=\"async\" style=\"border: 15px solid pink;\" alt=\"Spreadsheet and XML Global Substitution CSV Tutorial\" src=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.jpg\" title=\"Spreadsheet and XML Global Substitution CSV Tutorial\"  style=\"float:left;\"  \/><\/a><p class=\"wp-caption-text\">Spreadsheet and XML Global Substitution CSV Tutorial<\/p><\/div>\n<p>Programmatically, we came in half way with the programming when we presented <a title='Spreadsheet and XML Global Substitution Primer Tutorial' href='#saxgspt'>Spreadsheet and XML Global Substitution Primer Tutorial<\/a> as shown below.  The programming, then, had two inputs, namely &#8230;<\/p>\n<ol>\n<li>input spreadsheet&#8217;s CSV file <i>manually created<\/i><\/li>\n<li>index XML file<\/li>\n<\/ol>\n<p> &#8230; but that CSV file can be <i>programmatically created<\/i> rather than <i>manually created<\/i>.  And while we&#8217;re at <i>programmatically creating<\/i> the CSV we could also <i>programmatically create<\/i> the Korn Shell (ksh) easier there too, with the same program, rather than using TextWrangler&#8217;s Grep (RegEx) talents &#8230; not that we&#8217;re ungrateful or anything &#8230; but it is good to mix things up to improve procedures sometimes.<\/p>\n<p>And what programming language can we use, and what environment for that programming code?  We think &#8230;<\/p>\n<ul>\n<li>coding wise, we&#8217;ll use PHP (starring PHP&#8217;s <a target=_blank title='PHP glob() method information' href='http:\/\/php.net\/manual\/en\/function.glob.php'>glob()<\/a> method) &#8230; and the environment for that will be that &#8230;<\/li>\n<li>we&#8217;ll use a (local Apache\/PHP\/MySql web server) <a target=_blank title='MAMP local web server' href='http:\/\/mamp.info'>MAMP<\/a> subfolder (ie. how desktop application &#8220;can meet&#8221; web application) off its Document Root (\/Applications\/MAMP\/htdocs\/) &#8230; \/Applications\/MAMP\/htdocs\/tr_mapping\/ &#8230; to store the XML data files (no CSV needed as input this way, as it will be <i>programmatically created<\/i> in part 1 of 2 parts to the whole job) &#8230; which becomes accessible in two ways &#8230;\n<ol>\n<li><i>http:\/\/localhost:8888\/tr_mapping\/tr_mapping.php<\/i>#in_a_web_browser<\/li>\n<li>at Mac OS X Terminal desktop application command line via &#8230; <br \/><i>cd \/Applications\/MAMP\/htdocs\/tr_mapping<\/i><br \/><i>ksh -x tr_mapping.ksh<\/i><\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<p>Again, email is the conduit for both sides of &#8230;<\/p>\n<ol>\n<li>input in<\/li>\n<li>output out<\/li>\n<\/ol>\n<p> &#8230; to complete proceedings.  You can see the various aspects of this, in play, with today&#8217;s <a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.jpg\" title=\"Tutorial picture\">tutorial picture<\/a>, and though it is not much use to run the PHP code live, its style is leaning towards the generic side enough for us to want to share <a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/PHP\/tr_mapping.php_GETME\" title=\"tr_mapping.php\">tr_mapping.php<\/a> with you for your perusal &#8230; just in case (it is of use for you).<\/p>\n<hr>\n<p id='saxgspt'>Previous relevant <a target=_blank title='Spreadsheet and XML Global Substitution Primer Tutorial' href='https:\/\/www.rjmprogramming.com.au\/ITblog\/spreadsheet-and-xml-global-substitution-primer-tutorial\/'>Spreadsheet and XML Global Substitution Primer Tutorial<\/a> is shown below.<\/p>\n<div style=\"width: 230px\" class=\"wp-caption alignnone\"><a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/XML\/spreadsheet_xml_job.pdf\"><img decoding=\"async\" style=\"border: 15px solid pink;\" alt=\"Spreadsheet and XML Global Substitution Primer Tutorial\" src=\"http:\/\/www.rjmprogramming.com.au\/XML\/spreadsheet_xml_job-224of.jpg\" title=\"Spreadsheet and XML Global Substitution Primer Tutorial\"  style=\"float:left;\"  \/><\/a><p class=\"wp-caption-text\">Spreadsheet and XML Global Substitution Primer Tutorial<\/p><\/div>\n<p>Yesterday when we were discussing <a target=_blank title='Worldbank API World Country Reporting Regex Tutorial' href='https:\/\/www.rjmprogramming.com.au\/ITblog\/worldbank-api-world-country-reporting-regex-tutorial\/'>Worldbank API World Country Reporting Regex Tutorial<\/a> we mentioned &#8230;<\/p>\n<blockquote>\n<p> &#8230; and we use <strike>it<\/strike> (ie. Regex) with serverside PHP today, under the auspices of the <a target=_blank title='PHP regex' href='http:\/\/php.net\/manual\/en\/function.preg-match.php'>preg_match<\/a> function, though we most often use RegEx thinking with the Javascript <a target=_blank title='Javascript string replace function information from w3schools' href='http:\/\/www.w3schools.com\/jsref\/jsref_replace.asp'><i>replace<\/i><\/a> function, as the way to make substitutions for more than one occurrence, (the one occurrence design being) a default &#8220;curiosity&#8221; (but can be useful too) about Javascript&#8217;s version of substitution.  You may know this RegEx usage of the Javascript replace function as &#8220;global substitution&#8221;.<\/p>\n<\/blockquote>\n<p> &#8230; and that term &#8220;global substitution&#8221;.  Many editing jobs, especially text file based ones, require or benefit from &#8220;global substitution&#8221; carefully applied, that is.  It is common to see an editor who shies away from &#8220;global substitution&#8221; methods, and in many cases that is wise, but &#8220;global substitution&#8221; gets good results when you &#8230;<\/p>\n<ul>\n<li>substitute things you know exist in the precise form you intend to search for, and only there, where you want to replace &#8230; to <\/li>\n<li>replacements should not feed back into the substitution list &#8230; doh &#8230; or you will end up with a confused unintended result<\/li>\n<\/ul>\n<p>In real life, it is often the case that the conditions above are easy to obey, because you are mapping an old numbering and\/or naming system to a completely new and dissimilar numbering and\/or naming system.  That&#8217;s the case in a little job we drilled down into, to show you what we did, that involved RegEx thoughts, to solve a problem.<\/p>\n<p>So, with our job we had &#8230;<\/p>\n<ul>\n<li>Aim: Change some XML in one file to have the text in one column of a Spreadsheet be mapped to the contents of another column of that same Spreadsheet<\/li>\n<li>Inputs: Excel Spreadsheet with those two columns as mentioned above and the one input XML file<\/li>\n<li>From the User: Asked for the user to send the Excel Spreadsheet &#8230; Saved As Comma Separated Values (CSV) in MS-DOS format and the one input XML file as two attached files in an Email<\/li>\n<li>Processing:\n<ol>\n<li>Opened Email with <a target=_blank title='Gmail' href='http:\/\/gmail.com'>Gmail<\/a> web application in Safari web browser desktop application, on a MacBook Pro laptop<\/li>\n<li>Downloaded the two Attachments and copied over to where we like to work &#8230; the home of <a target=_blank title='MAMP' href='http:\/\/mamp.info'>MAMP<\/a> local Apache\/PHP\/MySql web server &#8230; on a Mac OS X system is <i>\/Applications\/MAMP\/htdocs<\/i> (which we&#8217;ll access later with the Mac OS X <i>Terminal<\/i> desktop application later via &#8220;cd \/Applications\/MAMP\/htdocs&#8221;)<\/li>\n<li>Opened our favourite Text Editor desktop application, called <a target=_blank href='http:\/\/www.macupdate.com\/app\/mac\/11009\/textwrangler' title='Mac TextWrangler text editor'><i>TextWrangler<\/i><\/a>, whose &#8220;Find and Replace&#8221; &#8220;Grep&#8221; suboption will be a feature of today&#8217;s solution<\/li>\n<li>File -&gt; Open the Spreadsheet CSV file<\/li>\n<li>Search -&gt; Find&#8230; &#8230; Matching Mode: <a target=_blank title='Linux grep command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/ugrep.htm'>Grep<\/a> &#8230; Find: ^ Replace: #  &#8230; Replace All &#8230; remember our &#8220;RegEx&#8221; &#8220;cheat sheet&#8221; discussion (lots of which is relevant to TextWrangler <i>Matching Mode: Grep<\/i> as well) at that aforesaid mentioned tutorial &#8230;<br \/>\n<blockquote>\n<ul>\n<li>^ can mean &#8220;start of&#8221;<\/li>\n<li>$ can mean &#8220;end of&#8221;<\/li>\n<li>. can sometimes mean &#8220;one existant character wildcard&#8221; &#8230; or sometimes it is % or ? for this in other &#8220;systems&#8221;<\/li>\n<li>* can often mean &#8220;zero or more of preceding character wildcard&#8221;<\/li>\n<li>[] and () bracketing rules are pretty crucial for the more esoteric usages &#8230; also study | usage<\/li>\n<\/ul>\n<\/blockquote>\n<p>? &#8230; well, we want to start out mapping all lines to non-acting <a target=_blank title='korn' href='http:\/\/www.computerhope.com\/unix\/uksh.htm'>Korn Shell<\/a> command lines<\/li>\n<li>Typed as the new top line <i>#!\/bin\/ksh<\/i> &#8230; just for completeness sake &#8230; is optional step<\/li>\n<li>Search -&gt; Find&#8230; &#8230; No Matching Mode &#8230; Find: #,[ Replace: <a target=_blank title='Linux cat command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/ucat.htm'>cat<\/a> COMM.MIL~INDEX.xml | <a target=_blank title='Linux sed command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/used.htm'>sed<\/a> &#8216;\/\\[    &#8230; Replace All<\/li>\n<li>Search -&gt; Find&#8230; &#8230; Matching Mode: Grep &#8230; Find: ]$ Replace: \\\\]\/g&#8217; > x.xxx ; cat x.xxx > COMM.MIL~INDEX.xml ; <a target=_blank title='Linux rm command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/urm.htm'>rm<\/a> -f x.xxx    &#8230; Replace All<\/li>\n<li>File -&gt; Save As&#8230; fix_csv.ksh (to \/Applications\/MAMP\/htdocs directory)<\/li>\n<\/ol>\n<li>Opened Terminal desktop application that has a default <a target=_blank title='bash' href='http:\/\/www.computerhope.com\/unix\/ubash.htm'>Bash<\/a> environment (a lot like Linux, but is (giving you access to) a Mac OS X BSD operating system, really)<\/li>\n<li>Typed in: <i>cd \/Applications\/MAMP\/htdocs<\/i>  # to get to data<\/li>\n<li>Typed in: <i>cp COMM.MIL~INDEX.xml COMM.MIL~INDEX_original.xml<\/i>  # to backup data ahead of processing, as well as to compare file sizes with later, as a sanity check<\/li>\n<li>Typed in: <i>ksh -x fix_csv.ksh<\/i>  # access Korn Shell interpreter and run the TextWrangler created Korn Shell Script (and the -x <i>switch<\/i> tells the interpreter to be verbose with output reporting)<\/li>\n<li>Typed in: <i>ls -l COMM.MIL~INDEX*.xml<\/i>  # first sanity check verified files different, and not disastrously so &#8230; good first sign<\/li>\n<li>Typed in: <i>fgrep -c &#8216;[S1.12.4.20]&#8217; COMM.MIL~INDEX*.xml ; fgrep -c &#8216;[CCR.28E.20]&#8217; COMM.MIL~INDEX*.xml<\/i>  # second sanity check to prove old\/new parts of first\/last relevant Spreadsheet CSV file records were correctly mapped &#8230; and they were &#8230; so<\/li>\n<li>Opened Email with <a target=_blank title='Gmail' href='http:\/\/gmail.com'>Gmail<\/a> web application in Safari web browser desktop application (and used &#8220;Forward&#8221; option, attaching that new XML file), on a MacBook Pro laptop &#8230; so that &#8230;<\/li>\n<\/ol>\n<li>Output: One XML file with the global substitutions expressed in the Excel Spreadsheet performed, returned to User via Email &#8220;Forward&#8221; option, attaching that new XML file<\/li>\n<\/ul>\n<p>We hope you can see the good use you can make with Email and a good Text Editor and Linux type shell scripting, influenced by <a target=_blank title='Linux RegEx' href='https:\/\/www.digitalocean.com\/community\/tutorials\/using-grep-regular-expressions-to-search-for-text-patterns-in-linux'>RegEx<\/a> pattern matching regarding &#8230;<\/p>\n<ul>\n<li><a target=_blank title='Linux grep command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/ugrep.htm'>grep<\/a><\/li>\n<li><a target=_blank title='Linux sed command information from computerhope' href='http:\/\/www.computerhope.com\/unix\/used.htm'>sed<\/a><\/li>\n<\/ul>\n<p>Which leaves us with today&#8217;s PDF slideshow of snapshots of making this job work, <a target=_blank href=\"http:\/\/www.rjmprogramming.com.au\/XML\/spreadsheet_xml_job.pdf\">here<\/a>.<\/p>\n<p>If this was interesting you may be interested in <a title='Click here to see topics in which you might be interested' href='#d28426' onclick='var dv=document.getElementById(\"d28426\"); dv.innerHTML = \"&lt;iframe width=670 height=600 src=\" + \"https:\/\/www.rjmprogramming.com.au\/ITblog\/tag\/regex\" + \"&gt;&lt;\/iframe&gt;\"; dv.style.display = \"block\";'>this<\/a> too.<\/p>\n<div id='d28426' style='display: none; border-left: 2px solid green; border-top: 2px solid green;'><\/div>\n<hr>\n<p>If this was interesting you may be interested in <a title='Click here to see topics in which you might be interested' href='#d28733' onclick='var dv=document.getElementById(\"d28733\"); dv.innerHTML = \"&lt;iframe width=670 height=600 src=\" + \"https:\/\/www.rjmprogramming.com.au\/ITblog\/tag\/csv\" + \"&gt;&lt;\/iframe&gt;\"; dv.style.display = \"block\";'>this<\/a> too.<\/p>\n<div id='d28733' style='display: none; border-left: 2px solid green; border-top: 2px solid green;'><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Programmatically, we came in half way with the programming when we presented Spreadsheet and XML Global Substitution Primer Tutorial as shown below. The programming, then, had two inputs, namely &#8230; input spreadsheet&#8217;s CSV file manually created index XML file &#8230; &hellip; <a href=\"https:\/\/www.rjmprogramming.com.au\/ITblog\/spreadsheet-and-xml-global-substitution-csv-tutorial\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,29,37],"tags":[85,111,234,283,319,320,1750,380,1615,2142,503,544,677,707,714,725,744,913,1040,1053,1105,1115,1139,1185,1867,1252,1255,1265,1480],"class_list":["post-28733","post","type-post","status-publish","format-standard","hentry","category-elearning","category-operating-system","category-tutorials","tag-apache","tag-attachment","tag-command-line","tag-csv","tag-desktop","tag-desktop-application","tag-document-root","tag-email","tag-glob","tag-global-substitution","tag-gmail","tag-grep","tag-korn-shell","tag-linux","tag-local-web-server","tag-mac-os-x","tag-mamp","tag-pdf","tag-regex","tag-replace","tag-script","tag-sed","tag-shell-script","tag-spreadsheet","tag-substitution","tag-terminal","tag-text-editor","tag-textwrangler","tag-xml"],"_links":{"self":[{"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/posts\/28733"}],"collection":[{"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/comments?post=28733"}],"version-history":[{"count":4,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/posts\/28733\/revisions"}],"predecessor-version":[{"id":28737,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/posts\/28733\/revisions\/28737"}],"wp:attachment":[{"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/media?parent=28733"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/categories?post=28733"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rjmprogramming.com.au\/ITblog\/wp-json\/wp\/v2\/tags?post=28733"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}