Thursday, July 27, 2006

Minutes PHP Developers Meeting

Minutes PHP Developers Meeting



Paris, November 11th and 12th, 2005


Attendees:




  • Marcus Börger (SOMABO)

  • Wez Furlong (OmniTI)

  • Rasmus Lerdorf (Yahoo!)

  • Derick Rethans (eZ systems)

  • Dmitry Stogov (Zend)

  • Zeev Suraski (Zend)

  • Jani Taskinen

  • Andrei Zmievski (Yahoo!)




Contents





1. Unicode


The first part of the meeting was dedicated to issues related to the Unicode
support for PHP 6.



1.1 Unicode on/off modes


Issue:
Currently it is possible to have Unicode on or off on a per request basis,
requiring the storage of both non-Unicode and Unicode variants of class,
method and function names in all the symbol tables.



Discussion:
Having to allow both a non-Unicode and a Unicode versions of names in the
symbol table is deemed unnecessary and we agree on allowing only a server wide
configuration setting to enable or disable Unicode support. This makes
implementation easier for some parts of the engine, it causes less problems
for opcode caches and it is slightly faster as no runtime conversion of the
names is necessary. We also discussed whether we should even allow Unicode
mode to be turned off as current micro benchmarks show that the Unicode
implementations of some of the string functions are up to 300% slower, and
whole applications up to 25% slower. Disallowing Unicode mode to be turned off
is expected to slow down the adoption of PHP 6 too as many ISPs would be
reluctant to install a version that immediately slows down the applications
of their users. When we provide a switch they can start by turning it off and
users would have an easy way of asking their ISPs to turn Unicode mode on by
simply reconfiguring it in php.ini. This is also why we chose to pick a
runtime configuration setting as opposed to a configure-time configuration
switch. We do need some trickery in order to be able to parse the setting from
php.ini as we need to know whether to enable or disable Unicode mode before we
activate our extensions. Another reason for providing a runtime switch instead
of an compile switch is that distributions would only have to create one
binary.


Conclusion:



  1. We provide a run-time switch in php.ini to enable or disable Unicode
    semantics. This setting will default to "On". When Unicode semantics are
    off you will still have access to Unicode features.




1.2 Different String Types


Issue:
A number of people are unhappy with the current implementation where there are
either too many different string types (binary, string, unicode) or the
multiple implementations of many internal engine functions and helper
functions.



Discussion:
After some discussion everybody attending seems to agree that only having two
string types (binary and string) makes sense. The Unicode semantics switch
will control what type the string literals are by default, of course.
Documentation will need to mention that with the switch on, all "strings" are
Unicode and with the switch off, all "strings" are Binary.


Conclusions:



  1. We use IS_STRING internally to represent binary data. In
    documentation and user land exposures we use "binary" as term. (For example
    as name for casts).


  2. We use IS_UNICODE internally to represent unicode string data. In
    documentation and user land exposures we use "string" as term.




1.3 Extension upgrading


Issue:
Extensions need to be upgraded and we need to be able to make sure that
non-upgraded extensions will not be activated when Unicode mode is selected.


Discussion:
We need to look at extensions and figure out which common things are there to
be solved for supporting Unicode. For theses tasks we then create an API that
extensions can use to implement Unicode support.


For PDO we need to have some way where the drivers send back UTF-16 if we are
in unicode mode. If the driver does not support it, then PDO needs to
up convert based on what the driver communicates.



Conclusions:



  1. We remove old parameter parsing API so that the extensions are forced use
    the new one which helps with Unicode support.

  2. We add a flag to the extension's module structure stating whether it
    supports Unicode or not. If the extension does not support Unicode it will
    not be loaded during start up if the Unicode switch is "On".

  3. We create a PHP/Unicode API that extensions can use to support Unicode in
    an easier way.

  4. Wez looks into what should be done in PDO to support Unicode properly.





1.4 Bundling ICU


Issue:
As ICU will be required by PHP regardless of whether the Unicode switch is on
or off, we need to decide on bundling the ICU library in part or fully.


Discussion:
ICU is quite a large library and bundling it with PHP will increase the
download size, but as PHP requires a specific version of ICU (3.4) it is
worthwhile to bundle. Another option beside bundling is to push distributions
to include ICU 3.4 and make PHP rely on it. Reasoning for this is that now the
distributions only have the responsibility to pick an ICU version that works
together with PHP. People who compile PHP from source can also easily compile
ICU from source or are competent enough to install it with their
distribution's tools.


Conclusion:



  1. We will make our build system bail out if an supported ICU library could
    not be found, and in the bail out message we provide a small set of
    instructions on where to get ICU. Included in this message is also a link
    to our documentation that provides a more extensive coverage on installing
    ICU.

  2. We will write to maintainers of distributions to lobby for including ICU
    3.4 in their distribution.





1.5 Filename Encoding


Issue:
Files on a file system can have names encoded in different character sets.
For example Windows can make use of it's UTF16 based filename API, while on
Linux it simply depends on the application.


Conclusions:



  1. We need to implement the already described "filename_encoding" setting to
    set the expected filename encoding.

  2. When functions such as readdir() encounter a filename that can not exist in
    the encoding that is set with the "filename_encoding" option (broken
    characters for example, or a latin1 name when the "filename_encoding" is
    set to UTF8) it returns a binary string.





1.6 Collator Caching


Issue:
Collators are used for comparing strings and it is quite expensive to open
(and close) one each time. A method to cache them needs to be found.


Discussion:
In order to prevent a huge amount of memory to be used by this cache we need
to limit the amount of objects we store in the cache. We deem the last 3
enough as in most cases you would only have one default collator, and perhaps
a secondary one. We also think that an application would often not use more
than one encoder between two character sets.


Conclusion:



  1. We will store the 3 last opened collators and encoding objects in a
    thread/process wide cache.





1.7 Optimising []


Issue:
Currently using the [] operator to select an arbitrary character is very slow
as PHP needs to start scanning the string from the start. This is because it
is impossible to calculate the correct memory position as the UTF16 encoding
that we use allows either 2 or 4 bytes per character.


Discussion:
A suggestion is to optimise [] by storing the offset/char nr (in the zval).
This will perhaps increase the size of a zval and we're wondering whether
it's worthwhile to optimise already.


Conclusion:



  1. We postpone whether we will implement the suggested implementation until
    later.





1.8 Locale Sensitivity


Issue:
Some PHP functions currently make use of the system locales which have some
problems, such as different names on different platforms and the
non-availability of many locales on a specific installation. ICU's support for
locales is very extensive and offers a lot of settings.


Discussion:
Locale configuration is important for localised applications and by relying on
ICU's database of locale data it is possible to build a locale-aware
application that can be deployed in a reliable way on a multitude of platforms
and installations. In order to make full use of this functionality PHP's
functions that deal with locales should be converted to ICU locales. ICU's
offering of locale-aware functions is extensive, however in most cases most of
this functionality is either not needed, nor wanted. This is why we should
pick a conservative default for the options, but also provide a more extensive
API so that the more advanced features can be used too.


As ICU's string comparison is locale aware we need to implement this in some
way. We chose not to implement locale-aware comparisons for == and strcmp() as
currently they are not. There is a separate function called strcoll() which is
now based on POSIX locales. This one will be rewritten for ICU locales, while
the == and strcmp() will stay they same as they are now. By keeping them as
they are now we are not breaking any current usage and offer (a bit) faster
and generic string comparation functionality.


Conclusions:



  1. == should be the same as strcmp, and not using collation. strcoll() does.


  2. We will use locale based functions where they make sense, and we pick a
    conservative default. Examples are strtoupper/strtolower, stristr etc..




1.10 Conversion Errors


Issue:
While converting between different encodings a conversion error might occur as
not all characters in the source string might be stored in the target
character set.


Discussion:
Currently PHP does not make use of exceptions for any of the internals and
implementing conversion errors with exceptions will be breaking this current
behaviour. By removing the need for automatic conversions between native
strings and unicode strings and v.v. this issue is also less important as now
the user will be largely responsible to do character set conversions. By
introducing an exception mode to the error modes we already support for
characters encoding failures we give the user full flexibility with handling
conversion failures.


Conclusions:




  1. We will not use exceptions for implicit conversion errors.

  2. We provide an additional error mode for character set conversion failures
    that throw exceptions on failures.





2. Cleanup of Functionality



2.1 register_globals


Issue:
Register globals are the source of many application's security problems and
cause a constant grief.


Discussion:
We shortly discussed how we want to attend users on the disappearance of this
functionality. We decided that if we find the setting during the startup of
PHP we raise an E_CORE_ERROR which will prevent the server from starting with
a message that points to the documentation. The documentation should explain
why this functionality was removed, and some introduction on safe programming.



Conclusions:



  1. We are going to remove the functionality.

  2. We throw an E_CORE_ERROR when starting PHP and when we detect the
    register_globals setting




2.2 magic_quotes


Issue:
Magic_quotes can be cumbersome for application developers as it is a setting
that can be set to on or off without any influence from within the script
itself as input parameters are escaped before the script starts.


Discussion:
In the same way as with the remove of the register_globals functionality, we
decided that if we find the setting during the startup of
PHP we raise an E_CORE_ERROR which will prevent the server from starting with
a message that points to the documentation. The documentation should explain
why this functionality was removed, and point the users at the input_filter
extension as replacement.



Conclusions:



  1. We remove the magic_quotes feature from PHP.

  2. We throw an E_CORE_ERROR when starting PHP and when we detect the
    magic_quotes, magic_quotes_sybase or magic_quotes_gpc setting.




2.3 safe_mode


Issue:
safe_mode is a feature in PHP that checks whether files to be opened or
included have the same GID/UID as the starting script. This can
cause many problems, for example if an application generates a cache
file, it will do this with the user ID that belongs to the web server
(usually "nobody"). As an application is usually uploaded by the user
belonging to the web account (say "client") the scripts can no longer
open the files that the application. The same problems happen
when for example an application generates an image.



Discussion:
As safe_mode is a name that gives the wrong signals as making PHP safe, we all
agreed that we should remove this function. It can never be made totally safe
as there will always be ways to circumvent safe_mode through libraries. This
kind of functionality also better belongs in the web server or other security
scheme. open_basedir is a feature that we will keep, and we will point users
to this functionality in the error message that is thrown when we detect this
setting on start-up.


Conclusions:



  1. We remove the safe_mode feature from PHP.

  2. We throw an E_CORE_ERROR when starting PHP and when we detect the safe_mode
    setting.




2.4 Deprecated Behaviour


Issue:
There are some places in PHP where we keep deprecated behaviour from earlier
PHP versions. Some of those might finally be dropped in PHP 6.



Discussion:
We only discussed a few cases where we might want to drop the deprecated
behaviour as we didn't have a full list of all cases.


The first issue that we raised was changing the E_NOTICE error for
call-time-pass-by-reference to an E_ERROR, or simply throwing a parse error.
We argued over this case and we decided to change this E_NOTICE to an E_STRICT
instead as it was argued that there is nothing wrong with doing a call-time
pass by reference.


The second issue was removing support for "var" altogether in PHP 6. Now it is
an alias for "public", but it will raise an E_STRICT warning. As there is no
real reason why we should remove it, we agreed on simply making "var" an alias
to "public" and removing the warning.



The last issue that came up under this subject is the return-by-reference of
the result of "new <object-name>". First we thought that there might be some
reason to keep this in case you have a factory having a list of references to
already instantiated objects, but as the behaviour would be exactly the same
keeping those by value we came to the conclusion that there is no reason to
try to do either of the two examples:



<?php
$foo =& new StdClass();
?>

<?php
function &foo()
{
return new StdClass();
}

$f = foo();
?>

Both these cases should return E_STRICT instead.



Conclusions:



  1. Make the call-time-pass-by-reference an E_STRICT error.

  2. We make "var" an alias for "public" and remove the warning for it.

  3. Assign "new" by reference will throw an E_STRICT error.





2.5 zend.ze1_compatibility mode


Issue:
zend.ze1_compatibility_mode tries to keep the old PHP 4 behaviour where objects
will be copied on assignment unless they are assigned by reference (and the
same for passing to a function as that is assigning too). It also affects
casting objects to an integer.


Discussion:
This functionality does not work 100% and its functionality was introduced to
migrate PHP 4 users to PHP 5 in an easier way. As this is not an issue anyway,
we intend to remove this setting.


Conclusions:



  1. We remove the zend.ze1_compatibility_mode feature from PHP.


  2. We throw an E_CORE_ERROR when starting PHP and when we detect the
    zend.ze1_compatibility_mode setting.




2.6 Support for Freetype 1 and GD 1


Issue:
FreeType 1 and GD 1 are archaic versions of the true type font rendering
and graphics manipulation libraries.


Discussion:
As they are old versions that are no longer maintained, and the new versions
are much better we see no problems by removing support for them.


Conclusions:




  1. We remove support for Freetype 1.

  2. We remove support for GD 1.




2.7 open_basedir


Issue:
open_basedir is a feature that restricts the opening of files by PHP to
certain directories.


Discussion:
This feature is relatively straightforward, and although it also suffers from
libraries being able to work around it, we decided to keep it as it saves a
useful purpose without causing any headaches to users (like safe_mode) does.


Conclusions:




  1. We keep the open_basedir functionality.




2.8 dl()


Issue:
dl() causes many problems as we are never unloading modules. In threaded
environments we already disable dl() already.


Discussion:
The first impression was that we can remove this functionality, but there is
some use for it in for example the CLI version of PHP. Instead of registering
dl() in the core we will leave it up to each SAPI to register this function,
as necessary. For the current SAPIs that we have we will only keep it for
CLI and embed.


Conclusions:




  1. We do not remove it fully, but only enable it when a SAPI layer registers
    it explicitly.




2.9 CGI/FastCGI mode


Issue:
The CGI/FastCGI code is messy.


Discussion:
FastCGI is better than CGI but it can currently be disabled which results in
messy code. We will clean up the code and always enable FastCGI for CGI SAPI.


Conclusions:




  1. Clean up the code, so that FastCGI mode can not be disabled.




2.10 Dynamic class inheritance


Issue:
Dynamic class initialisation or inheritance makes things slow(er).


Discussion:
It is useful to be able to do "if (...) class {...} else class {...}" although
it makes classes slower as inheritance is the done at runtime and not at
compile time. It also causes problems for accelerators such as APC. As it is
useful and plenty of scripts use it, it would not be a good idea to remove. It
is also possible for compiler caches to detect this, and they can then throw
warnings/errors if required.



Conclusions:



  1. We keep it in the engine, and leave it up to the caches
    to spit out warnings/errors.




2.11 register_long_arrays, HTTP_*_VARS


Issue:
register_long_arrays and the long versions of the super globals have been
deprecated since some time, and do not serve a real purpose.


Discussion:
The $_GET[], $_POST[], etc style superglobals are a better alternative since
they are shorter and have the same behavior. The register_long_arrays option
is also off by default making it less of a problem to remove this.



Conclusions:



  1. We remove the register_long_arrays setting and HTTP_*_VARS globals from
    PHP.

  2. We throw an E_CORE_ERROR when starting PHP and when we detect the
    register_long_arrays setting.




2.12 old type constructors


Issue:
Currently PHP 5 also supports the "old style" constructors from PHP 4, which
have the same name as the class name. This makes it impossible to have a class
without constructor and a method of the same name as the class (as it would be
called as constructor).



Discussion:
We discussed this subject and it was brought up that having a constructor with
the same name as the class name could cause problems in the following code:



<?php
class A {
function B() {
}
}
class B {
}

$b = new B();
?>

It was thought that this would call A::B() as constructor when instantiating
class B. However, this is not the case so there are no problems except then
the one mentioned above in "Issue".


Conclusions:



  1. We keep the alternative old-style constructor.





2.13 Case sensitivity of identifiers


Issue:
Case insensitivity of functions and classes is something a lot of developers
want to get rid of for quite some time, as it is an inconsistent behaviour
compared to variable names, which are case sensitive. It also causes
interesting problems such as in bug #35050.


Discussion:
Making this change outright is not a good idea, as there are plenty of people
using a "wrong" case for the internal functions, such as "Header",
"ImageCreate" etc which are officially all lower case letters. This will
create too much of a head ache.



We are looking in how to make this change in a gradual way - perhaps one where
we create upper- and lowercase aliases for the functions and do a two-phase
lookup; the ideal case is to match the natural function name case on the first
lookup. If that fails, then lowercase and try again; if that succeeds emit a
warning about the case mismatch. This gives us a stepping stone for
implementing case-sensitivity in the future.


Conclusions:



  1. We're going to try to find a way to see how we can make this change
    gradually, but do not "fix" it for PHP 6.




2.14 break $var


Issue:

"break $var" doesn't really work and there is no real reason for this. All you
can do with it is assign a number to $var and use that to break out of that
many loops.


Discussion:
It doesn't work and we don't see any use for it.


Conclusions:



  1. We remove support for dynamic break levels.






3. PECL


This section deals about moving extensions in and out of PECL and other
extension related issues.



3.1 XMLReader / XMLWriter in the distribution, on by default


Discussion:
XML Reader provides a simple XML parser internally based on SAX parsing, and
XML Writer provides an easy API for writing XML files. Both extensions should
make working with XML files a lot easier.


Conclusions:



  1. XML Reader into the core distribution and on by default

  2. XML Writer into the core distribution and on by default





3.2 Move non-PDO DB extensions to PECL


Issue:
PHP 5.1 introduces PDO, an extension that unifies Database APIs. With this we
do not "need" older extensions to access databases anymore.


Discussion:
We can not remove the "old" extensions, as at least OCI8 and MySQLi provide a
very rich set of features, which are not all supported by PDO. Some "old"

extensions can probably be moved to PECL as they are either unmaintained, or
superseded by PDO.


Conclusions:



  1. We decide on moving DB extensions out of the core later.




3.3 Move ereg to PECL


Issue:
Currently we have two extensions dealing with regular expressions, and soon
there will be a third one based on ICU.


Discussion:
Currently we see some problems with the bundled ereg library in some places due to people
specifying --with-regex=system. We also see distributions linking against
another library than our bundled one to prevent conflicts with the apache
bundled regex library, or the system's one. As most people seem to prefer
linking against something else than our bundled version, it seems proper to
remove this bundled library. If we remove the bundled library, then we need to
make the ereg functions into an extension, otherwise we can not enable them in
all cases. Some functionality in the core of PHP also uses POSIX regular
expressions, those should be rewritten to use PCRE then.



Conclusions:



  1. We make ereg an extension

  2. The PCRE extension will not be allowed to be disabled.

  3. The core of PHP should be made to work with PCRE so that we can safely
    disable ereg

  4. We unbundle the regex library




3.4 Split ext/dba into a core extensions and sub-extensions in PECL



Discussion:
Marcus wants to make ext/dba into a core extension, with all the drivers in
PECL. Splitting it up into separate extensions makes it much easier to change
handlers in php.ini easily.


Conclusions:



  1. ext/dba should be handled in the same way as PDO

  2. All the handlers stay in the distribution.




3.5 Fileinfo extension in the distribution


Issue:
PHP currently doesn't have any reliable mechanism for MIME-type detection.



Discussion:
The mime_magic extension doesn't work very well, and there is an extension in
PECL (Fileinfo). We suggest to include this extension into the core, and
enable it by default as MIME-type detection is something that most web
applications need. In the mean while we want to get rid of the "mime_magic"
extension in the core.


Currently the Fileinfo extension opens its database whenever you request it,
and this is not very efficient. We need to change the extension so that it
loads its database on MINIT, and possible see if we can link in the database
into the binary, instead of relying on an external file.


Conclusions:



  1. We move mime_magic from the core to PECL

  2. We move the Fileinfo extension to the core, and enable it by default.

  3. The Fileinfo extension should be updated to only load its database once on
    MINIT.





3.6 Other extensions to PECL?


Issue:
There are some extensions in the distribution that are either unmaintained, or
just not generally useful.


Discussion:
We had a quick look at the current extensions in the core, but decided not to
go over this and just continue the current practise of evaluating them one by
one.


Conclusions:



  1. We decide on moving one by one on a later time





3.7 Fix ext/soap and add support for wsse/secext


Issue:
The SOAP extension is getting more and more used, but has some limitations
regarding the support for security extensions.


Discussion:
The remaining issues need to be fixed, and some (though not all) support for
the security extensions need to be implemented. As the extension is useful for
many things, we also decided to turn it on by default.


Conclusions:



  1. ext/soap will be turned on by default


  2. We implement some of the security extensions.




3.8 Allow files with an open stream to be deleted


Issue:
Currently it is not possible to delete opened files, and this feature is
requested.


Discussion:
On Unix this is not a problem, you can simply unlink() the file. On Windows
however this is not possible as Windows simply prohibits a file from being
deleted when it is open.


Conclusions:




  1. Wez is going to check for a way on how to make this possible on Windows.




3.9 ext/bitset


Issue:
ext/bitset is a tiny extension that allows you to do operation on bitsets. It
is requested to be put in the core distributions.


Discussion:
The extension doesn't rely on any libraries, and is deemed useful enough to
put in the core. When we went over the code and tried to compile it we noticed
a lot of CS differences compared to our published standards, and there were
failed tests.


Conclusions:




  1. We add it only to the core distribution if the above mentioned problems are
    solved.





4 Engine Additions



4.1 Add a 64bit integer


Issue:
Being limited to a signed 32 bit integer is becoming more and more of a
nuisance, hence this suggestion for adding a 64 bit integer type.


Discussion:
The first idea was to make our current integer into a 64bit version, but that
can cause unwanted changes in behaviour that are very hard to detect. We can
also not restrict the current integer type to 32bits for the same reason. We
do need a new 64bit type, and we will be adding that as a new variable type.
The current integer we leave alone, so that it is an 32bit integer on 32bit
platforms, and a 64bit integer on 64bit platforms.



Conclusions:



  1. We leave the current integer type alone

  2. We add a new 64bit integer that is always 64bits regardless of platform

  3. The cast name for this new type is (int64) and internally we use IS_INT64
    and RETURN_INT64 etc..

  4. We do not add a specialised 32bit only integer type




4.2 Adding "goto"



Issue:
Goto is currently missing in PHP, and although there is a limited use for this
construct in some cases it can reduce the amount of code a lot.


Discussion:
There are some inherent problems with implementing goto, as jumping into a
foreach() loop will almost be impossible as at the start of the loop something
is initialised. The same is most likely true for other loop constructs.


As goto will most often be used to jump out of nested if statements, we think
that restricting the construct so that you can only jump out of a construct is
possible. Similarly restricting the construct so that you can only jump down
should satisfy people who do not want the ability to jump all over the place.


The name "goto" is misleading, and often associated with BAD THINGS(tm).
Because our proposed solution is not a real GOTO construct, we will instead
reuse the "break" keyword, and extend it with a static label.


An example of using a labeled break:




<?php
for ($i = 0; $i < 9; $i++)
{
if (true) {
break blah;
}
echo "not shown";
blah:
echo "iteration $i\n";
}
?>

Conclusions:



  1. We extend "break" by allowing breaking to a label.


  2. We ask Sara to make a patch for this, and we see how it is going to look
    like. We decide on that.




4.3 ifsetor() as "replacement" for $foo = isset($foo) ? $foo : "something else"


Issue:
Many people requested the "ifsetor()" operator that can set a variables
default value if it was not set before, akin to:




<?php
// If $_GET['foo'] is set, then its value will be assigned to $foo,
// otherwise 42 will be assigned to $foo.
$foo = ifsetor($_GET['foo'], 42);
?>

Discussion:


The name for this new operator is heavily disputed and we could not agree on a
decent name for it. As this operator is most often used for setting default
values for input variables we do need some kind of functionality here.


Instead of implementing ifsetor() we remove the requirement for the "middle"
parameter to the ?: operator. The middle parameter then defaults to the first
one. If the first parameter is not set, then we will still throw an E_NOTICE.
An example on how that might work:



<?php
// Evaluates to $_GET['foo'] if it's not set (with a notice) or false. It
// evaluates to 42 if $_GET['foo'] evaluates to true.
$foo = $_GET['foo'] ?: 42;

// Evaluates to "true" if $blå equals 42 and it evaluates to 54 otherwise.
$blå = $blå == 42 ?: 54;

$bar = bar() ?: 9;
?>


In combination with the new input_filter extension you then reach the original
goal of setting a default value to a non-set input variable with:



<?php
$blahblah = input_filter_get(GET, 'foo', FL_INT) ?: 42;
?>

If the input filter's logical filters (prefixed with FL) do not detect the
correct type, the value will be false. If it's false, then the above
expression assigns 42 to $blahblah.


Conclusions:



  1. We drop the middle value for the ?: operator.

  2. We did not agree on the implementation of ifsetor().





4.4 Allow foreach syntax for multi-dimensional arrays


Issue:
There was a suggestion to allow the following construct:



foreach( $array as $k => list($a, $b))

Discussion:
Currently the way on how to implement this is with the following code:




<?php
$a = array(array(1, 2), array(3, 4));
foreach( $a as $k => $v) {
list($a, $b) = $v;
}
?>

So it is not really required to implement this functionality. But we seemed it
useful enough to include this new syntax in PHP 6. This means that the above
example can now be written as:



<?php
$a = array(array(1, 2), array(3, 4));
foreach( $a as $k => list($a, $b)) {
}
?>

Conclusions:




  1. We add this syntax, and Andrei prepares a patch.




4.5 Cleanup for {} vs. []


Issue:
Currently you can use both {} and [] to access both a certain character in a
string and array elements. The suggestion is to make {} only work on strings
and add substr() functionality to it, and to make [] only work on arrays.


Discussion:
Although we deprecated (through the manual) the use of [] for string indexes,
a lot of people still do not use this. And internally there is absolutely no
difference between {} and []. Having two syntaxes for the same thing makes no
sense, and getting rid of [] would break all sorts of stuff. The original
reason for the {} was a technical one to simplify the parser, but the
landscape has changed and that reason no longer exists.


As far a code readability and obviousness goes, I doubt anybody would guess
their way to the $str{5} syntax. If you were new to PHP and you were going to
try to guess how you would get a character offset in a string, your first
guess to reading characters from a string would be []. Removing the obvious
syntax just doesn't make any sense. The other place {} is used outside of
control blocks is in quoted strings where "{$foo{1}}" is much uglier than

"{$foo[1]}".


Because having two syntaxes doing exactly the same does not make any sense
either, we agreed on deprecating the {} syntax in 5.1 with an E_STRICT, and
removing it in PHP 6 altogether.


Conclusions:



  1. We will undeprecate [] for accessing characters in strings.



  2. {} will be deprecated in PHP 5.1.0 with an E_STRICT and removed in PHP 6.



  3. For both strings and arrays, the [] operator will support
    substr()/array_slice() functionality:





    • [2,3] is elements (or characters) 2, 3, 4

    • [2,] is elements (or characters) 2 to the end

    • [,2] is elements (or characters) 0, 1, 2

    • [,-2] is from the start until the last two elements in the
      array/string

    • [-3,2] this is the same as substr and array_slice()

    • [,] doesn't work on the left side of an equation.



    With these rules, the behaviour for strings will be:





    • $str = "foo"; $str[] = "d"; we modify to make a concatenation.

    • $str = "fo"; $str[] = "od"; will concatenate to "food"

    • $str = ""; $str[] = "d"; should become the string "d", this should
      become an e_strict in 5.1.1. We need to check how common this is first.








4.6 Changes to the shut-up (@) operator that disallow (@ini_set(...))


Issue:
@ operator is very slow


Discussion:
When not requiring to have edge cases like @ini_set("error_reporting", E_ALL);
working correctly we can make it much faster. Ilia and Marcus already had a
patch for that.



Conclusions:



  1. We check with Andi if he has a valid reason to not accept that patch.




4.7 Allow foreach() without "as" part (I guess for iterators)


Issue:
In some cases with Iterators you might not need the "as $varname" part in the
foreach() statement.



Discussion:
This is an edge case, and it does not make sense to add this to the language.
It can be much better implemented with a function (such as splforeach()) which
allows this behaviour.


Conclusions:



  1. We do not want to add it.




4.8 Named Parameters


Issue:
The functionality of named parameters was suggested. Named parameters allow
you to "skip" certain parameters to functions. If it would be implemented,
then it might look like:




<?php
function foo ($a = 42, $b = 43, $c = 44, $d = 45)
{
// echos 42, 53, 54, 45
echo "$a $b $c $d\n";
}

foo(c => 54, b => 53);
?>

Discussion:
We don't see the real need for named parameters, as they seem to violate PHP's
KISS principle. It also makes for messier code.


Conclusions:




  1. We do not want to add it.




4.9 Make parameter order consistent over all functions


Issue:
One point that people find annoying in PHP is the non-standard way of how
parameters are ordered to functions. Because there is no consistent way, they
always have to use the manual to see what the order is.


Discussion:
We went over the string functions and found that there are only two functions
that have "needle, haystack" instead of "haystack, needle", namely in_array()
and array_search(). For in_array() it makes sense in a logical way to work in
the same way as SQL, where you first specify the value, and then you check if
it fits "in the array". As array_search() was modelled on this is_array()
function the parameter order is the same.



As there are not many inconsistencies, and changing them would cause quite
some problems for current applications we decided not to change the order.


Conclusions:



  1. We do not change parameter ordering for internal functions.




4.10 Minor function changes: microtime()


Issue:
It was suggested that microtime(true) become the default behaviour. Currently
if you pass no parameters the microtime function returns the current time as
"microseconds <space> unix_timestamp".



Discussion:
As you usually would want to have the full floating point number back, many
people use the following snippet (and perhaps even wrap that in a function):



<?php
$m = microtime();
$e = explode(' ', $m);
echo $e[0] + $e[1], "\n";
?>

We want to change the behaviour to return a normal float straight away (which
you can now do by passing "true" as first parameter). The following snippet:



<?php
$m = microtime(true);
echo $m, "\n";
$e = explode(' ', $m);
echo $e[0] + $e[1], "\n";
?>


Throws only a notice, while the result is still correct. As it's only a
notice, we feel safe enough to change the default behaviour to return a float.
We do need to investigate what happens if any of the following values are
passed though: none, null, false and true.


Conclusions:



  1. We will change the default behaviour of microtime() to return a float.





5. Changes to OO functionality



5.1 "function require __construct(" to force calling the parent's constructor



Issue:
Some extensions such as PDO allow their classes to be inherited. The
constructors of those inherited classes are required to call the extension
class' constructor though as that one needs to initialise the internal
structures. Currently there is no way in the engine to require this.


Discussion:
In order to address this issue we need to add a flag internally that tells the
engine that it should bail out if methods are called, but the
extensions' constructor was not called yet. For this to work, we need to add
a flag to the bottom most object in the hierarchy that is still an internal
class. Add an additional class pointer to the class pointing to the
constructor that should be called.


Conclusions:



  1. We add a flag to the class structure to record this

  2. We do not add new syntax for this to userland





5.2 Allow interfaces to specify the __construct() signature


Issue:
Currently it is not possible to define a __construct() signature in an
interface.


Discussion:
We didn't see a reason why this shouldn't be allowed, but Andi seems to have a
reason for it.


Conclusions:



  1. Zeev asks Andi why he doesn't want constructors in the interface. If there is no
    sound reason we add this possibility.





5.3 Implement inheritance rules for type hints


Issue:
Currently we don't check inheritance rules for type-hinted parameters.


Discussion:
Marcus explains with an example how inheritance rules for type-hinted
parameters should work, and also mentions that most probably no language
currently implements this correctly. This is not a very important check, and
therefore we see no reason why we should implement this either.


Conclusions:



  1. We are not going to add the checks.





5.4 Late static binding using "this" without "$" (or perhaps with a different name)


Issue:
Currently, the following script will print "A:static2":



<?php
class A {
static function staticA() {
self::static2();
}

static function static2() {
echo "A::static2\n";
}
}

class B extends A {
static function static2() {
echo "B::static2\n";
}
}

B::staticA();
?>


Discussion:
Currently there is no way do "runtime evaluating" of static members so that we
can call B::static2() from A::staticA() and this is a useful feature. In order
to implement this we need a new keyword to allow for this. As we do not want
to introduce yet another reserved word the re-use of "static" was suggested
for this.


The same example, but now with the call to "self::static2()" replaced with
"static::static2()", will then print "B::static2".



Conclusions:



  1. We re-use the "static::" keyword to do runtime evaluation of statics.

  2. Marcus prepares an implementation suggestion.




5.5 Object casting to primitive types


Issue:
PHP does not support a call-back when an object is cast to another (scalar)
type.



Discussion:
As PHP is a weekly typed language this kind of functionality does not make
sense in PHP. We only leave the __toString() method which is called on a
(string) cast. In PHP 5.1 the following already gives notices on the (int) and
(double) casts, where the __toString() method is also correctly called:



<?php
class a {
function __toString() {
return "string";
}
}

$a = new a;
echo (int) $a, "\n";
echo (bool) $a, "\n";
echo (string) $a, "\n";
echo (float) $a, "\n";
?>


Conclusions:



  1. We will not add magic call-back functions for other casts.




5.6 name spaces


Issue:
PHP currently has no name spaces, which some people find inconvenient as they
are required to prefix all their classes with an unique prefix.


Discussion:
First we briefly discussed the current name space patch, but as we were not all
familiar with its workings we did not go into deep detail for this. Then we
saw an alternative implementation of name spaces with "Modules". This is an
example on how this should work:




<?php
import M1 as M2;
echo M2::$var,"\n";
echo M2::c,"\n";
echo M2::func(),"\n";
echo M2::C::func(),"\n";
var_dump(new M2::C);
?>

M1.php:



<?php
module M1 {
var $var = "ok";
const c = "ok";
function func() { }

class C {
static function func() { return "ok"; }
static private function bug() { echo "bug\n"; }
}

private class FOO {
public class BAR {
static function bug() { echo "bug\n"; }
}
}

function bar() { return new M1::FOO(); }
}
?>


This approach suffers from a few problems:



  • When calling you still have to prefix all your classes.

  • You are forced into a specific naming scheme for your modules.


After the modules, we came up with some implementation guidelines on how we
would like to see support for name spaces and decided we would only introduce
them if the following rules could be implemented:



  • Implement a "name space" keyword that you can wrap around a class
    definition with {}.




  • Internally this adds <namespace-name> to the class names
    defined inside it separated by a separator. The following example would
    create the class "spl<separator>file":



    <?php
    namespace spl {
    class file {
    }
    }
    ?>


  • The suggested separator is "\" as this is the only free choice.




  • import will be request-wide and the import keyword copies class entries to
    it's new name



  • If we encounter a conflict due to importing we abort execution



  • "import spl\*" will copy all classes in the spl name spaces to the "normal"
    namespace which doesn't have a prefix.



  • Functions in name spaces are allowed.




  • Constants in name spaces are allowed unless we find problems with the
    implementation.



  • No variables are allowed in name spaces.




Conclusions:



  1. If we're going to do this, the name spaces look like above.

  2. Marcus is going to provide a patch.





5.7 Using an undefined property in a class with defined properties should throw a warning


Issue:
Current PHP will not throw any warning with the following code, and will just
create a new property:



<?php
class foobar {
public $supercalifragilisticexpialidoceous;

function rød() {
$this->supercalifragilistcexpialidoceous = 42;
}
}

$foo = new foobar;
$foo->rød();
?>

This makes debugging of code harder.


Discussion:
Just like with normal variables, you don't have to initialise properties. This
is a feature of the language, and is used a lot in projects. A solution would
be to mark a class as "strict" but that would introduce a new keyword and is
against the KISS approach of PHP.



Conclusions:



  1. We will not start throwing any notice for this.




5.8 Type-hinted properties and return values


Issue:
PHP only supports type hinted arguments and not for return values or
properties.


Discussion:



We quickly agreed that we don't need type-hinted properties, as it would cause
problems when they are assigned to other variables and it's just generally
not-PHP style.


For return values it does make some sense, but definitely not as much as
type-hinted arguments to functions. One discussion point was how to tell the
parser the return type of a functions, we came up with the following
suggestions for syntax (where ObjectName is the type-hint):



  1. function ObjectName &amp;funcname();

  2. function &ObjectName funcname();

  3. function &funcname ObjectName();

  4. ObjectName function &funcname();


  5. function &funcname() returns ObjectName;


Conclusions:



  1. We do not allow type-hinted properties as it's not the PHP way.

  2. We will add support for type-hinted return values.

  3. We need to pick a syntax for type-hinted return values.





5.10 Method calls


Issue:
Currently you can call methods both static and dynamic, whether they are
marked as static or not:



<?php
class gren {
static function grenStatic($a) { echo "$a - static function\n"; }
function grenDynamic($a) { echo "$a - dynamic function\n"; }
}

gren::grenStatic("static call");
gren::grenDynamic("static call");

$gren = new gren;
$gren->grenStatic("dynamic call");
$gren->grenDynamic("dynamic call");
?>


Discussion:


The second call will now throw an E_STRICT warning, and as it is dangerous we
decided to make this an E_ERROR instead.


Conclusions:



  1. We will make calling a dynamic function with the static call syntax
    E_FATAL.

  2. We will not disallow calling a static member with dynamic syntax.





5.11 ReflectionClass cache in zend_class_entry* and support "$this::class"


Issue:
Reflection is quite slow


Discussion:
We don't really care if this is cached, as it will only be done when
reflection is used. In this case things are sped up a bit.


Conclusions:



  1. We move the reflection code to its own extension.

  2. Marcus implements the ReflectionClass cache in struct zend_class_entry*.





5.12 Delegates


Issue:
PHP does not support delegates, but requires you to implement "delegation"
yourself.


Discussion:
For some interfaces it is useful to have "delegators" so that you don't have
to implement the functions to call delegators yourself. We did not see any
real-world code example, but basically this is what the "delegate" keyword
would do:




<?php
interface IF {
function f();
function g();
}

class whatever implements IF {
// Generate default delegator functions
delegate IF $if;

function __construct(IF $x) {
$this->if = $x;
}

/* Generated automatically internally:
function f() {
$this->if->f();
}
function g() {
$this->if->g();
}
*/
}
?>

Conclusions:



  1. We are not going to implement this.






6 Additions



6.1 Add an opcode cache to the distribution (APC)


Issue:
Many people are requesting an opcode cache in the default distribution of PHP,
as it boosts performance quite a lot.


Discussion:
Rasmus suggested to put an opcode cache into PHP, and after a quick discussion
we found that the only alternative license wise is APC. A few concerns were
raised on whether it should be enabled by default and whether other opcode
caches could still be used. Enabling by default is not possible because some
configuration needs to be done for the cache.


Conclusions:




  1. We include APC in the core distributions

  2. APC will not be turned on by default.

  3. APC will switch to mmap as default shared memory storage.




6.2 Merge Hardened PHP patch into PHP


Issue:
The Hardened PHP patch implements an amount of extra checks to PHP to make
things more secure.



Discussion:
We went over the features that the patch offers, and discussed whether we
might want to include them in stock PHP. One of the points that came up was
the allow_url_fopen setting we currently have in PHP. Many ISPs disable it
because of sound security reasons for remote paths with include(), but
unfortunately by turning this setting off they are also turning off the
possibility to use fopen("http:...") f.e. This is why we want to split this
option into two settings.


Conclusions:



  1. We want to include the patch' real-path fix.

  2. We want to include the protection against HTTP Response Splitting attacks
    (header() shouldn't accept multiple headers in one call).

  3. We split allow_url_fopen into two distinct settings: allow_url_fopen and
    allow_url_include. If allow_url_fopen is off, then allow_url_include will
    be off too.

  4. We enable allow_url_fopen by default


  5. We disable allow_url_include by default




6.3 Sand boxing or taint mode


Issue:
PHP does not have support for a sand boxed environment.


Discussion:
We discussed both a taint mode where input variables have to be untainted
before use, but this is a moot point as we need to have different contexts
(SQL, output...) and this can not be checked without knowing the application.
Taint mode is therefore not overly useful in PHP.


Sand boxing might be an option, but we need a good plan and a very solid patch
if we even want to consider including it into PHP.


Conclusions:




  1. No taint mode

  2. Only sand boxing if we have a rock solid implementation




6.4 All non-fatal errors should be marked in extensions as E_RECOVERABLE_ERROR


Issue:
Currently many extensions use E_ERROR if something goes wrong, which stops the
execution of the script immediately, even when this is not really required.


Discussion:
PHP 6 (Head) already includes a new error level "E_RECOVERABLE_ERROR" that can
be used instead of E_ERROR to signal a severe error that requires handling
with a user defined error handler otherwise it aborts the script. This should
be used by the engine when it can still recover from the error, while E_ERROR
should be reserved for cases where the engine is a in a definite unstable
state.



Extensions should be using E_WARNING for cases where something goes wrong,
unless they can really not continue after an error. In this case an
E_RECOVERABLE_ERROR error should be used. We need to go over all extensions
and engine and fix the error levels according to the policy.


Conclusions:



  1. We go over the engine and extensions and make sure only E_ERROR is used
    where the engine is in an unrecoverable state.




6.5 All non-fatal errors should become exceptions


Issue:
PHP does currently not throw exceptions for notices and warnings.


Discussion:
Nothing internally throws an exception and it is hard to figure out which
error level should throw an exception or not. Besides this, turning your
favourite error level into an exception can already easily be done with the
following snippet:




<?php
function error_handler($errorType, $message)
{
if ($errorType == E_NOTICE) {
throw new Exception( $message, $errorType);
}
}

set_error_handler('error_handler');

// Throws a notice
echo $new;
?>

Conclusions:



  1. We are not going to make exceptions out of any error level.




6.6 E_STRICT on by default


Issue:
PHP's E_STRICT error level is meant to point users to language level
warnings/errors. E_STRICT is currently not part of E_ALL and thus often those
E_STRICT messages will be hidden from users.



Discussion:
As we want to expose the language level warnings a bit more, and because of
having all error levels in E_ALL, except E_STRICT is confusing we will be
adding E_STRICT to E_ALL. As the current default is E_ALL & ~E_NOTICE we will
effectively turn on E_STRICT by default.


Conclusions:



  1. We add E_STRICT to E_ALL




6.7 Remove support for <?, <% and <script language="PHP"> and add "<?php =$var?>"



Issue:


Discussion:


Conclusions:



  1. We kill "<%" but keep "<?".


  2. Jani will prepare a patch that disallows mixing different open/close tags.

  3. We will not add "<?php =".




6.8 Rewrite build system


Issue:
The current build system is fine, but has some annoyances such as the
requirement to use config.m4 files for configuring parts of PHP.


Discussion:
The current stuff works well, except of some annoyances like still requiring
autoconf-2.13 and m4. We see no reason to actively start working on a new
build system, but if there is a good new idea and somebody who wants to
implement it we might have a look at it.



Conclusions:



  1. No active changes.

  2. We might want to look at a solid plan and when there is a volunteer to
    implement it.




6.9 Added persistent flag to zval struct


Issue:
It is impossible to allocate persistent zvals in PHP.


Discussion:
We had support for this before, but it was removed because nothing is using
it. The idea is to add this functionality back after figuring out the best way
on how to do this. There are two possible implementations:




  1. Use a specific memory block list for persistent zvals.

  2. Use a different memory allocator if the flag is set.


Conclusions:



  1. We need to find a good implementation suggestion.




6.10 Read-only properties



Issue:
It is impossible for extensions to provide read-only properties to user-land.


Discussion:
Some extensions provide read-only data in properties, but the engine api does
not support this. Therefore all extensions that do this are slower than
necessary. If we add support for this some extensions can be improved as they
can directly map a property to a memory element as returned by an extension's
internals.


We also discussed whether we should expose this to user land, but if we do
then we need to find a way on how to set the read-only properties values in the
first place.


Conclusions:



  1. Marcus prepares a patch to add ZEND_ACC_READONLY






15 Comments:

At 4:51 PM , Anonymous Anonymous said...

I love PHP, but I believe that PHP is going to it's end...
PHP5 was not adopted... even with some great changes. PHP6 was supposed to be the savior, but in the way it is, it won't...
People will always think in PHP as a 'website language'... The KISS principle it's becoming KIFS (keep it for sites)...
Too bad.. :(

 
At 5:04 PM , Anonymous Anonymous said...

I agree with Anonymous. Except for one of them :)
This is a great site indeed! Or at least the contents. It's a great blog!
And PHP IS a website language, don't try to make it something more :) It's nothing like real programming and it shouldn't be. PHP is getting better still, all these years.

 
At 6:23 AM , Anonymous Anonymous said...

Real anonymous functions would be nice (not a function created by sending the function text to another functino as in create_function).

 
At 7:58 PM , Anonymous Anonymous said...

Please make functionsnames, that make sence and not like bin2hex and strtoupper ... either 2 or to... !! PLEASE

 
At 7:58 PM , Anonymous Anonymous said...

Please make functionsnames, that make sence and not like bin2hex and strtoupper ... either 2 or to... !! PLEASE

 
At 7:00 PM , Anonymous Anonymous said...

It would be nice to see object casting of arrays and other objects, so that I can do something like:

$results = array( "key" => "value" );

class ResultsObj {
public $key;

public __construct() { }

public method() { /** do something... **/ }
}

$resultsObj = (ResultsObj) $results;

 
At 3:50 AM , Anonymous Anonymous said...

Your article is very informative and helped me further.

Thanks, David

 
At 6:30 PM , Anonymous Anonymous said...

MESSAGE

 
At 12:55 PM , Anonymous Anonymous said...

MESSAGE

 
At 9:40 AM , Anonymous Anonymous said...

Thanks for sharing your work and thoughts about the so-criticized but so-used PHP language.
I haven't gone through all of it yet, though it sounds more like a PHP 5.5 version to me so far.
Anyway I think it's a good thing. Some points are so better defined considering the proper nature of PHP itself (talking about the 'reference vs. copy' debates on a loop for instance, like there could have been any about such a point...)

 
At 3:23 PM , Anonymous Anonymous said...

Well, there's one thing I have to gripe about:

> 4.3 ifsetor() <
> If the first parameter is not
> set, then we will still throw an
> E_NOTICE.

Are you kidding?! This is NOT an replacement for isset($var) ? $var : 'default'! I'd like to see:


1. The first parameter should evaluate depending on the var is SET OR NOT if the middle parameter is empty. That means no logical operators. Radio-inputs with "boolean" values (string("0") for example) will be skipped with this crappy syntax.

2. Removing the notice. I mean, if anyone's using this, he will be AWARE that he is dealing with a variable that is MAYBE undefined. Notices (should) point to possible errors depending on typos. BUT - what about isset()? If you make a typo in isset, PHP will never gripe about this! So why don't you start throwing errors on isset() too?!?
A few changes in PHP 6 really suck. Duh.

 
At 1:43 PM , Anonymous Anonymous said...

I don't agree with rudie please try to make a good programming language out of PHP or it will come to an end as anonymous says. What I would like to see in php6 are the following things.

- Type hinting in methods for primitive types.

For example:

function foo(string $p1, bool $p2) {}


- Allow a custom array class.

class Array {}

in order to make an OO version of the array yourself.

$a = array(); //would instantiate the baked in array.
$b = new Array(); //would instantiate your custom array class.

- Magic operator methods.

For example:

class Number {
public function __construct($value) {
$this->value = $value;
}

public function __equals($otherValue) {
if($this->value == $otherValue) {
return true;
} else {
return false;
}
}

public function __increment() {
$this->value++;
}
}

to allow the following

$a = new Number(10);
$a++;
$b = new Number(11);
if($a==$b) {
//do something...
}

 
At 6:56 PM , Anonymous Anonymous said...

Hi all,

Wouldn't it be nice if you can specify the type in the method to support multiple methods with the same name from now on. Been hoping for this since php3.

class foo {
public function __construct(string $p1) {}
public function __construct(int $p1) {}
}

offcourse primitive types should also be supported inside methods.

Each method has it's own signature en depending on the given type the method will be called. A much nicer way instead of investigating all of your method params and it would be easier to document this.

 
At 2:42 PM , Blogger Vladimir said...

It is interesting. Why nobody wishes to introduce SIGNAL SLOTS.
http://doc.trolltech.com/4.3/signalsandslots.html
I think, that it would be useful.

 
At 11:26 AM , Anonymous Anonymous said...

I love PHP4 and I want to get right PHP6. Only thing I think it would be usefull is smth like triple argument function ?: but for 'switch' which could be inserted inline into the string something like
$i = 3;
$txt = 'something '. ($i ? 1 => 'one' : 2 => 'two' : 3 => 'three' : default : null) .' like that';

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home

eXTReMe Tracker