Iframe Srcdoc Equals UTF-8 Issue Primer Tutorial

Do you remember how with Javascript document.querySelectorAll Client Pre-emptive Iframe Tutorial, recently, we said …

Why can’t we manage this new functionality in the one pass through the “onload” event logic? Well, any self-respecting webpage content will contain both apostrophe and double quote characters (let alone line feeds and carriage returns) ( but we can if we can get to a Javascript DOM statement like document.getElementById(‘ifsd’).srcdoc=atob((” + ioissrc).split(‘;base64,’)[1]).replace(‘</bo’ + ‘dy>’, ‘ <style> ‘ + selectorplusis + ‘</style> </bo’ + ‘dy>’); )

? Well, that is true, initializing an iframe’s srcdoc attribute at the same time as the iframe is created can be tricky for HTML data of any complexity. Recently, though, we realized that the …

document.getElementById(‘ifsd’).srcdoc=atob((” + ioissrc).split(‘;base64,’)[1]).replace(‘</bo’ + ‘dy>’, ‘ <style> ‘ + selectorplusis + ‘</style> </bo’ + ‘dy>’);

… can be problematic, too, with UTF-8 (unicode) data (perhaps to do with UTF-16 surrogate pairs (we are not sure)). Of course, discovering this during that recent web application “Testing out document.querySelectorAll” in the blog posting thread owning the blog post above, as well as Javascript document.querySelectorAll Textarea Placeholder Tutorial‘s penchant for using as an absolute URL (thanks Wikipedia) …

HTTP://www.wikipedia.org/wiki/Einstein

… we discovered it outputting strings like …

Kingdom of WÃ¼rttemberg

… rather than, the better …

Kingdom of Württemberg

… leading us to be led down an “irrelevant PHP file_get_contents encoding problem garden path” until we undertook today’s “proof of concept” fgc_utf_fix.php‘s live run simplifying (and thus paring down) the methodologies of that “Testing out document.querySelectorAll” project and decoupling it and putting it back together, plus a good hour of logical calm reasoning, led us to deduct that it was not file_get_contents that was the problem but that [iframe].srcdoc=[HTMLcontent] causing the issue when that [HTMLcontent] contains UTF-8 unicode data. That makes sense. Not all UTF-8 data fits with an initialization statement designed for character data that is made up of one byte per character, so there could be mis-mappings doing this.

But then we stumbled upon the excellent Function to fix ut8 special characters displayed as 2 characters (utf-8 interpreted as ISO-8859-1 or Windows-1252) and adapted its PHP code into a Javascript function equivalent that could help put “Humpty Dumpty back together again”. Cute, huh?!

If this was interesting you may be interested in this too.

Your Numbers Game
Get clue	Your answer	Your Score
Clue?		Score 0/0

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Iframe Srcdoc Equals UTF-8 Issue Primer Tutorial

Leave a Reply Cancel reply

Numbers Guessing Game

Your Numbers Game

Recent Posts

Categories

Meta

Your Background Image

OnTopList

Recent Comments

Iframe Srcdoc Equals UTF-8 Issue Primer Tutorial

Leave a Reply Cancel reply

Numbers Guessing Game

Your Numbers Game

Recent Posts

Categories

Meta

Tags

Your Background Image

OnTopList