PDA

View Full Version : Escaping non-alphanumeric characters?


chartfieldconsultants
29-06-07, 01:47
Hoping someone can give me a 'for dummies' answer ro this one:

I'm trying to use the product description from the feed in my product page metadata, as:

<meta name="description" content="{$sProductDesc}" />

Which is all fine and dandy, works lovely, until I come across a product that uses double quote marks in its description. Such as this output:

<meta name="description" content="1/4" Router. 900w Motor&nbsp;11500 - 28500Rpm. 1/4" Collect&nbsp;. Adjustable Plunge Depth. Electronic Variable Speed Control. Parallel Guide. Circle Guide Dust Extractor Facility. 3 Year Guarantee. Product Ref: 762170" />

As you can see, the 'content' attribute was closed alsmost at once by the quotes used by the merchant to represent inches.

Is there anything I can do to escape the double quotes or force them to be rendered as & quot ; under those circumstances?

Thanks in advance,

Andy

authcode
29-06-07, 09:08
If you're using PHP you can try using:
htmlentities($sProductDesc, ENT_QUOTES);

xlcus
29-06-07, 21:49
If you're using PHP you can try using:
htmlentities($sProductDesc, ENT_QUOTES);

You also sometimes find that merchants have already escaped the characters and when you escape them again you end up with things like "&amp;" showing on your site.

For this reason I like to first decode the entities, and then re-encode them...

$cleanText = htmlentities(html_entity_decode($sProductDesc));

authcode
29-06-07, 23:06
That is true, good advice. The trouble is with some merchants encoding and some not and some only doing half a job there is no quick fix. Decoding and then encoding is a good solution but the best solution would be for merchants to do one or the other.
I also find that either decode or encode doesn't 100% work, especially when encoded with entity numbers rather than entity names, then you end up in a right mess. I think at the moment I just go with the raw data and hope that the merchants eventually conform.

xlcus
30-06-07, 09:05
I also find that either decode or encode doesn't 100% work, especially when encoded with entity numbers rather than entity names, then you end up in a right mess.

Entity numbers can also be dealt with if they have become re-encoded...

$cleantext = preg_replace('/&amp;#(x[a-f0-9]+|[0-9]+);/i', '&#\\1;', htmlentities(html_entity_decode($sProductDesc)));

chartfieldconsultants
02-07-07, 15:44
Many thanks for the advice, guys - and if I've uncovered another issue that needs to be looked at in the feeds then hurrah for that too!

But I really am a beginner at the scripting stuff - where would I put a line like that? On the page? In a configuration file somewhere?

authcode
02-07-07, 16:13
If you were to use this piece of code:
$cleantext = preg_replace('/&amp;#(x[a-f0-9]+|[0-9]+);/i', '&#\\1;', htmlentities(html_entity_decode($sProductDesc)));
just put it before the line where you are currently using $sProductDesc, then replace $sProductDesc in that line with $cleantext.