String representations

With revision 299 we (again) have a list representation for DoubleQuoted- and HereDoc-strings. So the string "hello \t world" is represented by:

ConstantEncapsedString(
   DoubleQuoted([Literal("hello ")
                ,Escape(116)
                ,Literal(" world")]
))


This not completely new because the first implementation already had this, but this representation had some problems. There was no way to model a hexadecimal escape without making things ambiguous. This problem can be solved by making the order of the internal parts explicit, but we then had a terrible representation of the string:

ConstantEncapsedString(
   DoubleQuoted(
     DQContent(Some("hello ")
              ,Escape(116)
              ,Some(" world"))
))

Note that this string is represented with 1 of those DQContent-thingies, the biggest one has three children. So every string with more then 3 parts has nested DQContents. Let me give you an example, the string "Hello \\\0123" looks like:

ConstantEncapsedString(
   DoubleQuoted(
   DQContent(Some("Hello ")
              ,DQContent(Escape(92)
                        ,None
                        ,OctaChar(48,49,50))
              ,Some("3"))
))

Terribly right?

So this had to be solved by a post-processing step. We have to walk over the tree bottom-up to rewrite these nasty DQContent's to a nice list. It is not nice to have such a post-process-step, but this was already required because of HereDoc-strings.

The problem with the HereDoc is analogous with the problem of the Dangling-else. If you have multiple HereDoc-strings with the same label you will have to choice where the first HereDoc ends. PHP always takes the shortest HereDoc so this piece of code has two variables:

<?php
   $foo = <<<BAR
     foo...
  
BAR;
  
   $bar = <<<BAR
     bar
  
BAR;
?>

As long as HereDoc is ambiguous this is easily solved by choosing the right amb-node.

But after the rewrite to the new internal implementation, HereDoc became unambiguous! This is a bit frustrating because it takes the longest HereDoc, which is wrong. I spend some time in trying to get it right, but I could not make it work (yet). So a new puzzle has entered the project, happy christmas! :)

P.S. As said in the last blog, I can hope, but maybe I should just read.

No comments: