Binary Expressions

2007-4-18

PEAR::HTML_BBCodeParser Parser Issue

Filed under: — The Warden @ 4:01 pm

I’ve come into a situation where I require to have BBCode parsed, this includes the standard tags supported by PEAR package HTML_BBCodeParser and custom BBCode tags I’ve added myself.

My problem is this, I’ve discovered that when an value has a space within the value the value is truncated at the first occurrence of the space. This applies to a URL, image file names and any additional attribute values (alt, style, etc.). This issue is present in the stable release and latest release in CVS for HTML_BBCodeParser. Here is some examples.

Before BBCode Parser
   [url=http://www.somedomain.com/Foo World?str=1]Foo World
After BBCode Parser
   <a href="http://www.somedomain.com/Foo">Foo World</a>

Before BBCode Parser
   [img w=100 h=99 alt=Enthalpy Wheel]/images/Enthalpy Wheel.png[/img]
After BBCode Parser
   <img src="/images/Enthalpy" width="100" height="99" alt="Enthalpy" />

Before BBCode Parser
   [p style=foo bar]something here[/p]
After BBCode Parser
   <p style="foo">something here</p>

Before BBCode Parser
   [div style=color:blue; font-size: 1em;]something here[/div]
After BBCode Parser
   <div style="color:blue;">something here</div>

This problem appears to exist across the board even without additional BBCode tags or additional attributes.

Any suggestions or directions on how I can resolve this problem would be much appreciated.

7 Responses to “PEAR::HTML_BBCodeParser Parser Issue”

  1. proggi Says:

    what about quotes?

    [url=”http://www.somedomain.com/Foo World?str=1″]Foo World[/url]

  2. Jorrit Says:

    Last time I checked out the code of this PEAR module it was very basic and unusable. It should be written using a lexer-like parser. vBulletin 3 has got a very nice bbcode parser, but that’s not free software, unfortunately.

  3. Mashiara Says:

    Likely the package uses regular expressions for parsing ht BBCode (nothing wrong with that), but unless you really know what you are doing it’s easy to miss corner cases. “Mastering Regular Expressions” is definitely something everyone using them should read.

  4. The Warden Says:

    Proggi, I’ve tried that and quotes don’t seem to be allowed weather double or single. Granted for URLs they should be encoded with urlencode().

  5. The Warden Says:

    Jorrit, I’m not really sure to be honest how it should be done right. I’ve found the code extremely useful actually and easy to use except for this one major issue of truncating values with spaces. Yeah, that would not be much help using vBulletin.

  6. The Warden Says:

    Mashiara, yes I believe that is what is being used. I admit I’m weak in this area. When I have more time I will for sure read that book. Regular expressions are very handy and something one should know very well.

  7. Rick Says:

    Hello:

    On line 395 of BBCodeParser.php, try changing the regex to:
    preg_match_all(”![\\s$oe]([a-z]+)=([^$ce]+)(?=[\\s$ce])!i”, $str, $attributeArray, PREG_SET_ORDER);

    I noticed they use almost all of their regular expressions incorrectly, so it may require even more massaging than that. That package needs some serious work, for sure.

Leave a Reply

Take back your mailbox - CAUCE.org

Powered By Wordpress PHP: Hypertext Preprocessor MySQL Powered Download Juice, the cross-platform podcast receiver
Proud To Be Canadian Get Firefox Valid XHTML Valid CSS
<NO>OOXML Logo


24 queries. 0.277 seconds.
Copyright © 2004 - 2005 by Adam Douglas