PEAR::HTML_BBCodeParser Parser Issue
I’ve come into a situation where I require to have BBCode parsed, this includes the standard tags supported by PEAR package HTML_BBCodeParser and custom BBCode tags I’ve added myself.
My problem is this, I’ve discovered that when an value has a space within the value the value is truncated at the first occurrence of the space. This applies to a URL, image file names and any additional attribute values (alt, style, etc.). This issue is present in the stable release and latest release in CVS for HTML_BBCodeParser. Here is some examples.
Before BBCode Parser [url=http://www.somedomain.com/Foo World?str=1]Foo World After BBCode Parser <a href="http://www.somedomain.com/Foo">Foo World</a> Before BBCode Parser [img w=100 h=99 alt=Enthalpy Wheel]/images/Enthalpy Wheel.png[/img] After BBCode Parser <img src="/images/Enthalpy" width="100" height="99" alt="Enthalpy" /> Before BBCode Parser [p style=foo bar]something here[/p] After BBCode Parser <p style="foo">something here</p> Before BBCode Parser [div style=color:blue; font-size: 1em;]something here[/div] After BBCode Parser <div style="color:blue;">something here</div>
This problem appears to exist across the board even without additional BBCode tags or additional attributes.
Any suggestions or directions on how I can resolve this problem would be much appreciated.


April 19th, 2007 at 12:57 am
what about quotes?
[url=”http://www.somedomain.com/Foo World?str=1″]Foo World[/url]
April 19th, 2007 at 1:16 am
Last time I checked out the code of this PEAR module it was very basic and unusable. It should be written using a lexer-like parser. vBulletin 3 has got a very nice bbcode parser, but that’s not free software, unfortunately.
April 19th, 2007 at 4:51 am
Likely the package uses regular expressions for parsing ht BBCode (nothing wrong with that), but unless you really know what you are doing it’s easy to miss corner cases. “Mastering Regular Expressions” is definitely something everyone using them should read.
April 24th, 2007 at 1:27 pm
Proggi, I’ve tried that and quotes don’t seem to be allowed weather double or single. Granted for URLs they should be encoded with urlencode().
April 24th, 2007 at 1:29 pm
Jorrit, I’m not really sure to be honest how it should be done right. I’ve found the code extremely useful actually and easy to use except for this one major issue of truncating values with spaces. Yeah, that would not be much help using vBulletin.
April 24th, 2007 at 1:30 pm
Mashiara, yes I believe that is what is being used. I admit I’m weak in this area. When I have more time I will for sure read that book. Regular expressions are very handy and something one should know very well.
May 10th, 2007 at 6:10 pm
Hello:
On line 395 of BBCodeParser.php, try changing the regex to:
preg_match_all(”=([^$ce]+)(?=[\\s$ce])!i”, $str, $attributeArray, PREG_SET_ORDER);
I noticed they use almost all of their regular expressions incorrectly, so it may require even more massaging than that. That package needs some serious work, for sure.
–