Discover
PHP Internals News
103 Episodes
Reverse
PHP Internals News: Episode 103: Disjunctive Normal Form (DNF) Types
Friday, June 24th 2022, 09:07 BST
London, UK
In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Disjunctive Normal Form Types" RFC that he has proposed with Larry Garfield.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:15
Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 103. Today I'm talking with George Peter Banyard again, this time about a disjunctive normal form types RFC, or DNF, for short, which he's proposing together with Larry Garfield. George Peter, would you please introduce yourself?
George Peter Banyard 0:39
Hello, my name is George Peter Banyard, I work on PHP paid part time, by the PHP foundation.
Derick Rethans 0:44
Just like last time, we are still got colleagues.
George Peter Banyard 0:46
Yes, we are indeed still call it.
Derick Rethans 0:48
What is this RFC about? What is it trying to solve?
George Peter Banyard 0:52
The problems of this RFC is to be able to mix intersection and union types together. Last year, when intersection types were added to PHP, they were explicitly disallowed to be used with Union types. Because: a) mental framework, b) implementation complexity, because intersection types were already complicated on their own, to try to get them to work with Union types was kind of a big step. So it was done in chunks. And this is the second part of the chunk, being able to use it with Union types in a specific way.
Derick Rethans 1:25
What is the specific way?
George Peter Banyard 1:27
The specific way is where the disjoint normal form thing comes into play. So the joint normal form just means it's a normalized form of the type, where it's unions of intersections. The reason for that it helps the engine be able to like handle all of the various parts it needs to do, because at one point, it would need to normalize the type anyway. And we currently is just forced on to the developer because it makes the implementation easier. And probably also the source code, it's easier to read.
Derick Rethans 1:54
When you say, forcing it up on a developer to check out you basically mean that PHP won't try to normalize any types, but instead throws a compilation error?
George Peter Banyard 2:05
Exactly. It's, it's the job of the developer to do the normalization step. The normalization step is pretty easy, because I don't expect people to do too many stuff as intersection types. But as can always be done as a future scope of like adding a normalization step, then you get into the issues of like, maybe not having deterministic code, because normalization steps can take very, very long, and you can't necessarily prove that it will terminate, which is not a great situation to be in. Imagine just having PHP not running at all, because it's stuck in an infinite loop trying to normalize the format. It's just like, oh, I can't compile
Derick Rethans 2:39
Would a potential type alias kind of syntax help with that?
George Peter Banyard 2:44
Maybe, I'm not really sure. Actually reading like research about it from computer scientists, in functional programming languages, which is everything is compiled on my head. And they have the whole thing was like, well, they need to type type normalize, and especially with type aliases, they haven't really figured out a way yet. So I'm not sure how we are going to figure out a way if experts and PhD students and researchers haven't really figured out a way.
Derick Rethans 3:08
And is the reason for that mostly, because PHP, resolves types while it is running code sometimes because it has to overload classes, and then it might find out it is an inherited class, for example?
George Peter Banyard 3:19
Yes, I think it's like this weird thing where might maybe PHP has like kind of an advantage, because it doesn't need to, like resolve all of the types at once. And if you have a type alias, it's just oh, if it's used, and you just need to resolve it, and then try to figure it out. There's also the added complexity of like, variance checks, because most functional programming languages, they have variance to some degree, but they don't have the whole inheritance of like typical OOP languages have. It's kind of a very strange field, the fact that yeah, PHP is just like, well, we kind of do stuff at runtime, and you don't necessarily need everything. And it just works is like, well, we'll do. That's mainly the reason why the dev needs to do the normalization step, the form is done. It's also I think, the most easiest to understand, it's just like, Oh, you have this and this, or this group, or stuff, or this group of stuff, or this thing, simple type. The other form would be another normalized form would be conjunctive normal form, which is a list of ANDs of ORs to just have this thing, or X, like (A or B or C) and X and (Y or Z), which I think is harder to understand.
Derick Rethans 4:26
What is the exact syntax then?
George Peter Banyard 4:28
So the exact syntax is, if you want to have an intersection type was in a union type, you need to like bracket it by parentheses. And then you have like the normal pipe union operator and you can mix it with single types, you can mix it with true, you can mix it with false, which are literal types, which now exist, or just normal, bool types.
Derick Rethans 4:48
The parenthesis is actually required. You don't rely on operator precedence to make things work?
George Peter Banyard 4:53
Yes. Relying on operator precedence is terrible.
Derick Rethans 4:57
Yep, I agree.
George Peter Banyard 4:58
I'd say Oh, yeah, but I think I've heard this argument on the list like a couple of times, it's just, oh, yeah, but maths, like, has like, and as priority over like, or, I mean, I did three years of a maths degree and not gonna lie. Maths notation is terrible for most of us. People don't even agree on terminology. I'm just gonna say, let's, let's just do better.
Derick Rethans 5:19
I agree. I mean, most coding standards for any sort of variable for like conditions, will already require parenthesis around multiple complex clauses anyway, right? I mean, it's a sensible thing to do, just for readability, in my opinion. So the RFC also talks about a few syntax that you aren't allowed to do, and that you have to normalize or deconstruct yourself, what kinds of things are these?
George Peter Banyard 5:41
if you would want to have a type which has an intersection of a class A with at least one other class, so let's say X or Y, but you can always convert it into DNF form, how this type would be, it would be (A and X) or (A and Y). This seems to be the more unusual case, I would imagine. One of the motivating cases of DNF types is to do something like Array or (Traversable and Countable). I don't really see mixing and matching various different object interfaces in differencing, the most useful user land cases to be able to do Array or (Traversable and Countable) so that you can use just count or seeing something as an array, or you have like Traversable and Countable and ArrayAccess. And it's just like, Oh, here's an object, which kind of behaves like an array.
Derick Rethans 6:32
I think there's currently another RFC just being proposed, that extends iterator_to_array to multiple types as well to accept more things. So that sort of fits into this category of things to do with iterables and traversals then I suppose.
George Peter Banyard 6:49
yeah
Derick Rethans 6:50
I'm hoping to talk to the author of that RFC as well. At the moment where two and a half weeks or so before a feature freeze, you now see a whole flurry of RFCs while it was a bit quiet in the last few months. So because you're adding to the type system, that's also usually has consequences for variance rules, or rather, how inheriting works with return types and argument types, as well as property types. What do DNF types mean for these variance checks?
George Peter Banyard 7:19
The variance is checks
PHP Internals News: Episode 102: Add True Type
Thursday, June 2nd 2022, 09:06 BST
London, UK
In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Add True Type" RFC that he has proposed.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:00
Hi I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is episode 102. Today I'm talking with George Peter Banyard about the Add True Type RFC that he's proposing. Hello George Peter, would you please introduce yourself?
George Peter Banyard 0:33
Hello, my name is George Peter Banyard, I work part time for the PHP Foundation. And I work on the documentation.
Derick Rethans 0:40
Very well. We're co workers really aren't we?
George Peter Banyard 0:43
Yes, indeed, we all co workers.
Derick Rethans 0:45
Excellent. We spoke in the past about related RFCs. I remember, which one was that again?
George Peter Banyard 0:51
Making null and false stand alone types
Derick Rethans 0:53
That's the one I was thinking of him. But what is this RFC about?
George Peter Banyard 0:56
So this RFC is about adding true as a single type. So we have false, which is one part of the Boolean type, but we don't have true. Now the reasons for that are a bit like historical in some sense, although it's only from PHP 8.0. So talking about something historical. When it's only a year ago, it's a bit weird. The main reason was that like PHP has many internal functions, which return false on failure. So that was a reason to include it in the Union types RFC, so that we could probably document these types because I know it would be like, string and Boolean when it could only return false and never true. So which is a bit pointless and misleading, so that was the point of adding false. And this statement didn't apply to true for the most part. With PHP 8, we did a lot of warning to value error promotions, or type error promotions, and a lot of cases where a lot of functions which used to return false, stopped returning false, and they would throw an exception instead. These functions now always return true, but we can't type them as true because we don't have it, and have so they are typed as bool, which is kind of also misleading in the same sense, with the union type is like, well, it only returns false. So no point using the boolean, but these functions always return true. But if you look at the type signature, you can see like, well, I need to cater to the case where the returns true and when returns false.
Derick Rethans 2:19
Do they return true or throw an exception?
George Peter Banyard 2:22
Yeah, so they either return true, or they either throw an exception. If you would design these functions from scratch, you would make them void, but legacy... and we did, I know it was like PHP 8.0, we did change a couple of functions from true to void. But then you get into these weird shenanigans where like, if you use the return value of the function in a in an if statement, null gets because in PHP, any function does return a value, even a void function, which returns null. Null gets coerced to false. So you now get like, basically a BC break, which you can't really? Yeah, we did a bit and then probably we sort of, it's probably a bad idea. That's also the point of like, making choices, things that are static analysers can be like, more informants being like, Okay, your if statement is kind of pointless here.
Derick Rethans 3:06
Yeah, you don't want to end up breaking BC. Now, we already had false and bool, you're adding true to this. How does that work with Union types? Can you make a union type of true or false?
George Peter Banyard 3:18
No. So there are two reasons mainly. A. true and false is the same as like boolean, which is like just use Boolean in this case. But you can say, well, it's more specific, so just allow it. So that's would be reasonable. But the problem is, false has different semantics than boolean. False does not coerce values. So it only accepts false as a literal value. Whereas boolean, if you're not in strict type, which is a lot of code, it will cause values like zero to false one, or any other integers to true. It will coerce every other integer to true, like the true type follows the behaviour of false of being a value type. So it only accepts true, you would get into this weird distinction of does true or false, mean exactly true or false? Or do you get the same behaviour as using the boolean type?
Derick Rethans 4:07
So I would say that true or false would than be more restrictive than bool.
George Peter Banyard 4:12
Exactly, which is a bit of a problem, because PHP internally has true and false and separate types, which also makes the implementation of this RFC extremely easy, because PHP already makes the distinction of them. But at the same time, the boolean type is just a union of the bitmask of true and false. You can't really distinguish between the types, true or false, or the boolean type within the type system. Currently just does it by checking if it only has one then it can do like two checks. Specifically, you would need to add like an extra flag. I mean, it's doable, but it's just like, Well, who knows which semantics we want? Therefore, just leave it for future discussion because I'm not very keen on it to be fair.
Derick Rethans 4:55
True or false are really only useful for return values and not so much for arguments types, because if you have an argument that that always must be true, then it's kind of pointless to have of course.
George Peter Banyard 5:05
Same as like it was with the null type RFC. Although there might be one case where PHP internal functions might change the value to true for an argument, I can maybe two types, would be like with the define function, this thing being like case insensitive or case sensitive, I don't remember what the parameter actually; could actually either be false or true, because at the moment, I think emits a notice, things do like the this thing is not supported, therefore the values what was ignored. But we could conceivably see that in PHP 9, we would actually implement this as a proper like: Okay, this only accepts true, yes, this argument is pointless, but it's in the middle of the function signature, so you can't really move it. The spl_register_overload function has like as its second argument, the throw on error or not, which since PHP 8 only accepts true, but it's in the middle of the function. The last argument is still very useful. It's prepend, instead of append the autoloader, I think, or might be the other way around, check the docs. Since PHP 8, this only accepts true. So if you pass in false, it will emit a notice and saying you'd like this argument has just been ignored. So whatever. But we can't really remove the argument. Because well, it's, if you use the third argument, as with positional arguments, then you would change like the signature and you would break it. Now, we don't have a way to enforce in PHP to use named arguments, because that would be a solution. It's just like, well, if you want to set this argument, you need to use named arguments, but we can't do that. Otherwise, then creating a new function, which has an alias, which is also kind of terrible. That would be one of the maybe only cases where you would actually get like true as a as an argument
Derick Rethans 6:39
is that now currently bool? And there's a specific check for it?
George Peter Banyard 6:42
It's currently bool, and if you pass in false enrolment, like a warning, or notice.
Derick Rethans 6:47
How would inheritance work? As return types, you can always make them smaller, right? More restrictive.
George Peter Banyard 6:53
Yes, that's also the thing. But that already exists in some sense a problem of. Like if you go from boolean to false, you're already restricting the type. And that problem existed, even before the restricting, well allowing false as a stand-alone type if you had like, as a union, because you could always say like, I don't know. That problem already existed with Union types. Because you could have something like overturn an array or bool and then you change it to either an array or false. And then if you try to return like zero, then you will get like a coercion problem. So the same problem applies with true, because it only affects return values. And like you control the
PHP Internals News: Episode 101: More Partially Supported Callable Deprecations
Thursday, May 19th 2022, 09:05 BST
London, UK
In this episode of "PHP Internals News" I talk with Juliette Reinders Folmer (Website, Twitter, GitHub) about the "More Partially Supported Callable Deprecations" RFC that she has proposed.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is episode 101. Today I'm talking with Juliette Reinders Folmer, about the Expand Deprecation Notice Scope for Partially supported Callables RFC that she's proposing. That's quite a mouthful. I think you should shorten the title. Juliette, would you please introduce yourself?
Juliette Reinders Folmer 0:37
You're starting with the hardest questions, because introducing myself is something I never know how to do. So let's just say I'm a PHP developer and I work in open source, nearly all the time.
Derick Rethans 0:50
Mostly related to WordPress as far as I understand?
Juliette Reinders Folmer 0:52
Nope, mostly related to actually CLI tools. Things like PHP Unit polyfills. Things like PHP Code Sniffer, PHP parallel Lint. I spend the majority of my time on CLI tools, and only a small portion of my time consulting on the things for WordPress, like keeping that cross version compatible.
Derick Rethans 1:12
All right, very well. I actually did not know that. So I learned something new already.
Juliette Reinders Folmer 1:16
Yeah, but it's nice. You give me the chance now to correct that image. Because I notice a lot of people see me in within the PHP world as the voice of WordPress and vice versa, by the way in WordPress world to see me as far as PHP. And in reality, I do completely different things. There is a perception bias there somewhere and which has slipped in.
Derick Rethans 1:38
It's good to clear that up then.
Juliette Reinders Folmer 1:39
Yeah, thank you.
Derick Rethans 1:40
Let's have a chat about the RFC itself then. What is the problem that is RFC is trying to solve?
Juliette Reinders Folmer 1:46
There was an RFC or 8.2 which has already been approved in October, which deprecates partially supported callables. Now for those people listening who do not know enough about that RFC, partially supported callables are callables which you can call via a function like call_user_func that which you can't assign to variable and then call as a variable. Sometimes you can call them just by using the syntax which you used for defining the callable, so not as variable but as the actual literal.
Derick Rethans 2:20
And as an example here, that is, for example, static colon colon function name, for example.
Juliette Reinders Folmer 2:26
Absolutely, yeah.
Derick Rethans 2:27
Which you can use with call_user_func by having two array elements. You can call it with literal syntax, but you can't assign it to a variable and then call it. Do I get that, right?
Juliette Reinders Folmer 2:36
Absolutely. That's it. There's eight of those. And basically, the original RFC from Nikita proposed to deprecate support for them in 8.2, add deprecation notices and remove support for them altogether in PHP nine. And the original RFC explicitly excluded two particular things from those deprecation notices. That's the callable type and using the syntaxes in combination with the is_callable function, where you're checking if the syntax is callable. The argument used in the original RFC was to keep those side effect free. The problem with this is that with the callable type, this means you go from absolutely no notice or nothing, to a fatal error in PHP 9. Everything works, and you're not getting any notification. But in PHP 9, its fatal error at the moment that callable is being passed to a function.
Derick Rethans 3:31
This is the callable type in function declarations.
Juliette Reinders Folmer 3:33
Yeah, absolutely. And with is_callable, I discovered a pattern in my wanderings across the world where people use the syntax in is_callable, but then use it in a literal call. So not using call_user_func, not using a variable to call it, but it's callable static double colon method name, and then called static double colon method name as literal. And that pattern basically, for valid calls would mean that that function would no longer be called in PHP 9 without any notification whatsoever.
Derick Rethans 4:13
So it's a silent change that you can't detect at all.
Juliette Reinders Folmer 4:17
Yeah, which to me sounded dangerous. I started asking some questions about that. But six weeks ago, the conclusion was, well, maybe this should be changed. But as this was explicit in the original RFC, we can't just change it. We need to have a new RFC to basically amend the original RFC and remove the exception for these two situations and allow them to throw deprecation notices.
Derick Rethans 4:44
What are you proposing to change with this RFC than?
Juliette Reinders Folmer 4:47
What this RFC is proposing is simply to remove the exception that the callable type and is_callable are not throwing a deprecation notice. This RFC is proposing that they should throw a deprecation notice, so that more of these type situations can be discovered in time for PHP 9 to prevent users getting fatal errors.
Derick Rethans 5:08
Now, of course, we have no idea when PHP nine is actually showing up, but I don't think it will be this year. Well, I know it won't be this year, and it certainly won't be be next year neither, I think.
Juliette Reinders Folmer 5:17
That's all the same. I mean, it makes there'll be two, three years ahead, but it doesn't really make sense to have the main deprecation in 8.2 and then have the additional deprecation in 8.4 or something.
Derick Rethans 5:29
Absolutely.
Juliette Reinders Folmer 5:30
It's a lot more logical to have it all in in the same version. Because it's all related. It's basically the same thing without the exception for callable type. And is_callable.
Derick Rethans 5:42
Although there is no current application, would this be able to be found if you had like a comprehensive test suite?
Juliette Reinders Folmer 5:48
Yes and no. Yes, you can find this with a test suite. But one, you're presuming that there are tests. Two, that the tests covered the effected code with enough path coverage. Three, imagine a test you've written yourself at some point in the past where which affected callables, you might have, you know, a data provider where you say: Okay, valid callable function, which you've mocked or, you know, closure, which you've put in and second, this function does not exist. Okay, so now you're testing this function, which at some point in its logic has a callable, and expects that type to receive that type. But are you actually testing with the specific deprecated partially supported callables? Even if you have a test, and the test covers the affected code, if you do not test with one of these eight syntaxes, which has been deprecated, you still cannot detect it. And then, four, you still need to make sure that the tests are routinely run, and in open source, that's generally not a problem. Most open source projects, use GitHub actions by now to run the tests automatically on every pull request, etc. But, have the tests been turned on to actually run against PHP 8.2. Are the tests run against pull requests? I mean, there are still plenty of projects, which don't do that kind of thing. Yes, you can detect it with a good test suite. But there's a lot of caveats when you will not detect it. And more importantly, you will not be able to detect it until PHP 9.
Derick Rethans 7:23
Yes, when your code and stops behaving as you were expecting it to be.
Juliette Reinders Folmer 7:28
Yeah, because in 8.2, you're gonna get deprecation notices for everything else, but these two situations. But not in 8.2, not in 8.3, not in 8.4, and then whatever eights we're gon
PHP Internals News: Episode 100: Sealed Classes
Thursday, March 24th 2022, 09:04 GMT
London, UK
In this episode of "PHP Internals News" I talk with Saif Eddin Gmati (Website, Twitter, GitHub) about the "Sealed Classes" RFC that he has proposed.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is episode 100. Today I'm talking with Saif Eddin Gmati about the sealed classes RFC that they're proposing. Saif, would you please introduce yourself?
Saif Eddin Gmati 0:31
Hello, my name is Saif Eddin Gmati. I work as a Senior programmer at Les-Tilleuls.coop. I'm an open source enthusiast and contributor.
Derick Rethans 0:39
Let's dive straight into this RFC. What is the problem that you're trying to solve with it?
Saif Eddin Gmati 0:43
Sealed classes just like enums and tagged unions allow developers to define their data models in a way where invalid state becomes less likely. It also eliminates the need to handle unknown subtypes for a specific model, as using sealed classes to define models gives us an idea on what child types would be available at run time. Sealing also provides us with a way for restricting inheritance or the use of a specific trait. For example, if we look at logger trait from the PSR log package that could be sealed to logger interface. This way, we ensure that every use of this trait is coming from a logger not from any other class.
Derick Rethans 1:24
I'm just reading through this RFC tomorrow, again, and something I didn't pick up on reading to it last time. It states that PHP already has sort of two sealed classes.
Unknown Speaker 1:35
Yes, the throwable class in PHP can only be implemented by extending either error or exception. The same applies for DateTime interface, which can only be implemented by extending DateTime class or DateTime Immutable class.
Derick Rethans 1:52
Because PHP itself doesn't allow you to implement either throwable or DateTimeInterface. I haven't quite realized that that these are also sealed classes really. What is sort of the motivation behind wanting to introduce sealed classes?
Unknown Speaker 2:06
The main motivation for this feature comes from Hack the programming language. Hack contains a lot of interesting type concepts that I think personally, PHP could benefit from and sealed classes is one of those concepts.
Derick Rethans 2:18
What kind of syntax are you proposing?
Saif Eddin Gmati 2:21
The syntax I'm proposing actually there is three syntax options for the RFC currently, but the main syntax is inspired by both Hack and Java. It's more similar to the syntax used in Java as Hack uses attributes. Personally, I have been I guess, using attributes from the start as I personally see sealing and finalizing similar as both effects how inheritance work for a specific class. Having sealed implemented as an attribute while final uses a keyword brings more inconsistency into the language which is why I have decided not to include attributes as a syntax option.
Derick Rethans 2:56
In my opinion, attributes shouldn't be used for any kind of syntax things. What they should be used for is attaching information to already existing things. And by using attributes again, to extend syntax, you sort of putting this syntax parsing in two different places , right? You're putting it both in the syntax as well as in attributes. I asked what the syntax is, but I don't think he actually mentioned what the syntax is.
Saif Eddin Gmati 3:20
The syntax the main set next proposed for the RFC is using sealed and permit as keywords we first have the sealed modifier which is added in front of the class similar to how final or abstract modifiers are used. We also have the permit clause which is basically a list allows you to name a specific classes that are able to inherit from this specific type.
Derick Rethans 3:43
So when you say type here, is that just interfaces and classes or something else as well?
Saif Eddin Gmati 3:48
It's classes interfaces and traits. Traits are allowed to add sealing but they are not allowed to permit. Okay for example, an interface is not allowed to permit a trait because a trait cannot implement an interface
Derick Rethans 4:03
In the language itself, when does this get enforced?
Saif Eddin Gmati 4:06
This inheritance restriction gets enforced when loading a class. So let's say we are loading Class A currently if this class extends B, we check if B is sealed. And if it is we check if B allows A to extend it. But when loading a specific sealed class, nothing gets actually checked. We just take the permit clause classes and store them and move on.
Derick Rethans 4:32
It only gets checks if you're trying to implement an interface.
Saif Eddin Gmati 4:36
This gets enforced when trying to implement an interface, extend that class, or use it trait.
Derick Rethans 4:41
Okay. What are general use cases for this feature?
Saif Eddin Gmati 4:45
General use cases for a feature are for example, implementing programming concepts such as Option which is a type that can only have two subtypes. One is Some, other is None. Another concept is the Result where only two subtypes are possible, either success or failure. Another use case is to restrict inheritance. As I mentioned before, for example, logger trait from the PSR log package is a trait that implements some of the method methods in logger interface, and expects whoever is using that trait to implement the rest. However, there is no restriction by the language regarding this, we can seal this trait to a logger interface ensuring that only loggers are allowed use this trait.
Derick Rethans 5:34
When you say that Option has like the value Some or None, just sound like an enum to me. How should I think differently about enums and sealed classes here?
Saif Eddin Gmati 5:43
Enums cannot hold a dynamic value. You can have a value but you cannot have a dynamic value, however, tagged unions will allow you to implement option the same way. Tagged unions are that useful only for this specific case, there is some other cases such as the one I mentioned for traits that cannot actually be implemented using the tagged unions. There is also the I don't know how to say this. Let's say we have a type A that sealed and permitting only B and C. And this case A on itself, as long as it's not an abstract class, is by itself a type. Can be used as a normal class, you can create an instance and use it normally. However with tagged unions, the option itself would not be a type, you either have some or none. That's the main difference between tagged unions until classes
Derick Rethans 6:37
A tagged union PHP doesn't have them. So how does a tagged union relate to enums?
Saif Eddin Gmati 6:43
With tagged unions as the, there is an RFC that's still in draft, I suppose that uses actually it is built on top of enums that that's why.
Derick Rethans 6:55
I reckon once that gets closer to completion, I'll end up talking to the author of that RFC. So something I'm wondering, can a sealed type permit only one other type? Or does it have to be more than one?
Saif Eddin Gmati 7:10
No, it can permit only one type. Let's say we have class A that only permits B. However, another thing is class B does not actually have to extend A, like if A is permitting B, B does not actually have to implement A. It's still useful because another class called C can extend B and implement A, so an instance of A B can still exists.
Derick Rethans 7:36
I'm not quite sure whether I understood that. If you have an interface that says A permits B, then B is not required to implement A, mostly because the moment you loads class B, you don't even know it exists, right? Because it doesn't refer to it.
Saif Eddin Gmati 7:54
Yes.
Derick Rethans 7:55
It's just going to break anything?
PHP Internals News: Episode 99: Allow Null and False as Standalone Types
Thursday, March 10th 2022, 09:04 GMT
London, UK
In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Allow Null and False as Standalone Types" RFC that he has proposed.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:15
Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is episode 99. Today I'm talking with George Peter Banyard, about the Allow null and false at standalone types RFC that he's proposing. Hello, George Peter, would you please introduce yourself?
George Peter Banyard 0:36
Hello, my name is George Peter Banyard. I work on the PHP language, and I'm an Imperial student in maths in my free time.
Derick Rethans 0:44
Are you're trying to say you're a student in your free time or contribute to PHP in your free time?
George Peter Banyard 0:49
I feel like at this time, it's like, both are true at the same time.
Derick Rethans 0:53
Let's hop into this RFC. It is titled allow null and false as standalone types. What is the problem that it is trying to solve?
George Peter Banyard 1:02
This is the second iteration of this RFC. So the first one was to just allow null initially, and null is the unit type In type theory parlance of PHP, ie the type which only has one value. So null is a type and a value. And the main issue is that when for leads more with like inhabitants, and like the Liskov substitution principle. If you have like a method, like the parent method, which can be told like either null or an object, and your implementation in a child class always returns null, for various reasons, maybe because it doesn't support this feature, or whatever is out, or... If your child method only returns null, currently, you can't document, that you can't type this properly, you can document it in a doc comment or something like that. But due to how PHP type handling works, you need to specify at least like another type with null in the union. Basically resort to always saying like mimicking the parent signature, when you could be more specific. This was the main use case I initially went into.
Derick Rethans 2:08
If I understand correctly, you can't just have an inherited method that has hinted as to just return null?
George Peter Banyard 2:14
Exactly. If you always return null, maybe because you always work or something like that, then you must still declare the return type as like null or exception, which is not a concrete because you say what, like why never fail. And like static analysers, if they can figure it out that you're using a child class, they can't maybe like do some assumptions or work further down that like what you're doing is redundant or things like that. So that's one of the main reasons I initially went with it. And I didn't add false initially, because it was like, well, false, it's not really a type properly. It's, it's what's called a value type. False is one value from the Boolean type. And I was like, Well, okay, we're just going to limit it to like, being the type theory purist, limited to proper types, where null is a proper type, although it's a bit sometimes misunderstood, I feel in the PHP community at large. And then people were like, well, if we add null, then by the only type-ish thing, which you can use in a type declaration, or whatever, which can't be used in a return type on its own, is false. And it's just weird. So why not add it in full. So that was the second thing as to why I added it. Some of PHP internal's functions being terribly designed because they were designed back in the early noughties, return null on success and false on failure, which you can't probably type at the moment. Currently, we need to type them as like Boolean or null, but true can never be returned in this case. And there are some other some other people have reached out to me it's like, well, yeah, but I always return false in this case. Or I also return always true in this case, although true, we have this weird asymmetry that we have false as a value type and not true.
Derick Rethans 3:49
What was the reason for having false but not true?
George Peter Banyard 3:53
When the union type RFC got discussed and passed for PHP 8.0, false was added, because a lot of traditional behaviour of PHP internal functions, was to return false on failure, instead of the technically more correct thing would be to return null. Because loads of functions return a false on failure, and saying that like in returns, these types, or a Boolean would be basically lying because you could never have true, false was included in it. With the restrictions that you can only use false as the complement with other types. So you need to do for example, array, or false, you couldn't just use false.
Derick Rethans 4:37
Would it also mean that you can define a return type of a method that inherited a method that returns a bool, as false?
George Peter Banyard 4:48
Yes, that would be now possible with the amended proposal. Yeah, which goes back to this weird a symmetry, we're probably. Adding true to make a complete would be a future RFC to do.
Derick Rethans 5:00
Now, we've talked about return types. But I guess the reverse applies to arguments?
George Peter Banyard 5:06
Arguments and property types also would, would be allowed to, like declare themselves as like null or false. The usefulness here is way more limited. Because if you declare an argument to be of type null, then basically you can only ever pass a null to it. And then therefore, the type doesn't do anything.
Derick Rethans 5:26
But in an inherited method, you could then widen it.
George Peter Banyard 5:31
Yes, exactly. You could always say: Well, this argument exists, it's always null. If you extend like your class or message, then you can add other types. But in theory, you can already do that by adding like an argument at the end of the message, because that's LSP compliant. The case for, and properties of those, because they are typing, they're in like their beads. Kind of debatable why you would do that. But it's just that like, well, if you accept types at one point, just restricting them like somewhere else gets very weird. At this point is more like look at the human review, or like use static analysis for the analyser to tell you like this argument is redundant and just remove it or this property doesn't make any sense. Because if it can only ever be null, why does it even exist in the first place?
Derick Rethans 6:13
Right now, you can already use false in union types, but why not with null or false?
George Peter Banyard 6:19
That goes back to the when a union type RFC got introduced. Null got added as a keyword. Before you could only use the question mark, before a type to make the type nullable. If you have a more complex union type, to not use the question mark in front of it. Therefore, the null keyword got added as a proper type. And because the logic was, Well, you shouldn't ever be able to return just null. Because then that function is kind of equivalent to void. Because of that, it was said that like, Well, okay, null and false basically have like kind of the same status is that like, if you just want to use null on its own, you're doing something kind of weird. And if you're returning more than false, like that signature is very strange. I think when that was discussed, nobody knew initially that an actual PHP function within one of the extensions, like in core had such a weird signature. Which mainly, we just started discovering that after this got, like accepted and we could like actually start properly typing the internal functions, and then you discover these weird edge cases where sounds like, that's a bit strange, can't properly document it. We just need to make like a note on the PHP documentation side. And like the type signature kind of lies to you. PHP's type hierarchy is a bit strange, void kind of lives on its own. So if the function is marked as void, it must always like any child inheritance, or whatever needs to be void. And when you type return in the function body, you need to always use return with like a semicolon afterwards, you can't even return null. Although, under the hood, PHP will always return a value when you call a function, even if the function is void, which will be null.
Derick Rethans 7:58
The RFC also talks about question mark null, what is that supposed to be?
PHP Internals News: Episode 98: Deprecating utf8_encode and utf8_decode
Thursday, March 3rd 2022, 09:02 GMT
London, UK
In this episode of "PHP Internals News" I chat with Rowan Tommins (GitHub, Website, Twitter) about the "Deprecate and Remove utf8_encode and utf8_decode" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP Internals News, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 98. Today I'm talking with Rowan Tommins about the "Deprecate and remove UTF8_encode and UTF8_decode" RFC that he's proposing. Hi, Rowan, would you please introduce yourself?
Rowan Tommins 0:38
Hi, I'm Rowan Tommins. I'm a PHP software architect by day and try and contribute back to the community and have been hanging around in the internals mailing list for about 10 years and contributed to make the language better, where I can.
Derick Rethans 0:57
Excellent. Yeah, that's how I started out as well, many, many more years before that, to be honest. This RFC, what problem is this trying to solve?
Rowan Tommins 1:08
PHP has these two functions, utf8_encode and utf8_decode, which, in themselves, they're not broken. They do what they are designed to do. But they are very frequently misunderstood. Mostly because of their name. And because Character Encodings in general, are not very well understood. People use them wrong, and end up getting in all sorts of pickles that are worse than if the functions weren't there in first place.
Derick Rethans 1:37
What are you proposing with the RFC then?
Rowan Tommins 1:39
Fundamentally, I'm proposing to remove the functions. As of PHP 8.2, there will be a deprecation notice whenever you use them, and then in 9.0, they would be gone forever, and you wouldn't be able to use them by mistake, because they just wouldn't be there.
Derick Rethans 1:56
I reckon there's going to be a way to actually do what people originally intended to do with it at some point, right?
Rowan Tommins 2:02
So yeah, there are alternatives to these functions, which are much clearer in what you're doing, and much more flexible in what you can do with them so that they cover the cases that these functions sound like they're going to do, but don't actually do when you understand what they're really doing.
Derick Rethans 2:20
I think we'll get back to that a little bit later on. You're wanting to deprecate these functions. But what do these functions actually do?
Rowan Tommins 2:27
What they actually do is convert between a character encoding called Latin-1, ISO 8859-1, and UTF-8. So utf8_encode converts from Latin-1 into UTF-8, utf8_decode does the opposite. And that's all they do. Their names make it sound like they're some kind of fix all the UTF 8 things in my text. But they are actually just these one very specific conversion, which is occasionally useful, but not clear from their names.
Derick Rethans 3:01
It's certainly how I have seen it used in the past, where people just throw everything and the kitchen sink at it, and expecting it to be valid UTF 8, and then at the end, decode. I mean, the decoding was not even part much of this, right? It's just throw everything at it, and then magically it will all be UTF 8. But I reckon that's not really quite the case. When and how does that go wrong?
Rowan Tommins 3:26
So what actually ends up happening is, because text doesn't know what encoding it's in. Something that people misunderstand about character encoding is they think it's like, the text is a certain colour, and the computer knows what colour it is. And if you tell the computer to make it a different colour, then it will work. But it's not like that. In the computer, there's just the sequence of binary. And the encoding is how to read that binary as text. And if you tell the computer to read it as Latin 1, it will read it as Latin 1. If you take to convert from Latin 1 to UTF 8, it will assume the input is Latin 1, it will convert to UTF 8 on that basis. If your text actually wasn't Latin 1 in the first place, you're just going to end up with garbage. And some of the worst cases of that is when you already have UTF 8, and then you run utf8_encode on it, because the language doesn't know that you've already got UTF 8, so it tries to read its Latin 1, write it out ass UTF 8 and you get this weird Mojibake. I don't know pronouncing that right.
Derick Rethans 4:27
I think it's pronounced Mojibake.
Rowan Tommins 4:30
Mojibake.
Derick Rethans 4:31
It's a Japanese term, because clearly these things, these issues happened with Japanese text quite a lot because they have a lot more different and difficult characters and encodings as well. With which things often go wrong though?
Rowan Tommins 4:44
Using an unco on text that's already UTF 8 is obviously a big one. Usually obvious, but occasionally people just getting a muddle with that. The other thing that often happens is confusing with similar encoding. Latin 1 is often mistaken for a different coding windows 1252. To the extent that web pages labelled as Latin 1, web browsers will assume that they're actually in Windows 1252. These PHP functions don't make that assumption. If your text is actually in Windows 1252, and it's been mislabelled Latin 1, you might still think you're doing the right thing. So I've got Latin 1 text, but you haven't. And then the characters that are different, are going to get mangled again. And there's a few other related encodings that often look the same. There are a few other encodings that look the same at a glance that again, will go wrong on any character that's different between the different encodings.
Derick Rethans 5:43
How could a function tell which encoding a certain text was in?
Rowan Tommins 5:49
It's tricky. There are libraries out there that try to do it. Some encodings that are sequences of bits that aren't a valid character. So if any of those appear, it's definitely not in that encoding. Unfortunately, a lot of encodings, every pattern of bits has a meaning. It's just not necessarily mean. So you can't look at the string and just tell at a glance. The only way I've seen that does it effectively, is trying to guess based on what language text it might be in. If your text suddenly has a load of symbols in the middle of sentences, you're probably using the wrong encoding. If it's suddenly got a load of capital letters, in the middle of words, you're probably using the wrong encoding. So you can make guesses like that, that ultimately, there are only ever guesses.
Derick Rethans 6:38
It's only always going to be a guess, right? You can't really tell for certain what it it is, which I've seen people assume that she can just tell. We have concluded that utf8_encode and decode don't actually do what they say they don't magically encode everything to UTF 8. What if things go wrong? How are errors handled?
Rowan Tommins 6:58
If you're converting from Latin 1 into UTF 8, there Latin 1 covers all 256 possible eight bit binary strings. Those will correspond directly to a single mapping in Unicode and therefore in UTF 8. So there are no errors as such, when that happens, but it might not be what you want. One of the most notable ones that's different between these encodings is Latin 1 was standardized in 1985, the Euro didn't exist, then. The euro symbol doesn't have an encoding in Latin 1. If you've got a euro sign, you haven't got Latin 1 text, but you might think you've got Latin 1 text, and it will just encode it to what to a control character, which is where the windows 1252 code page puts the euro symbol, it replaces some control characters in Latin 1. One of the reasons why these character encodings are so easily confused is they've all nicely built to being compatible on top of each other. Latin 1 is deliberately an extension of ASCII. Windows 1252 is deliberately an extension of Latin 1, replacing some control characters. UTF 8 is also based on Latin 1, the first section of Unicode is actually the Latin 1, characters UTF 8 will encode and slightly differently so that it can carry on above 256. So in that direction, you can't actually get an error, you could just get a string, that doesn't make sense. Going back the other way. Unicode has, I think, potentially 11 million or something, and actually, at least a million assigned code points. Latin 1 only has 256. So you can't map all those back. And this function, the utf8_decode just replaces any th
PHP Internals News: Episode 97: Redacting Parameters
Thursday, January 27th 2022, 09:09 GMT
London, UK
In this episode of "PHP Internals News" I chat with Tim Düsterhus (GitHub) about the "Redacting Parameters in Back Traces" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:00
Before we start with this episode, I want to apologize for the bad audio quality. Instead of using my nice mic I managed to use to one built into my computer. I hope you'll still enjoy the episode.
Derick Rethans 0:30
Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 97. Today I'm talking with Tim Düsterhus about Redacting Parameters in Backtraces RFC that he's proposing. Tim, would you please introduce yourself?
Tim Düsterhus 0:50
Hi, Derick, thank you for inviting me. I am Tim Düsterhus, and I'm a developer at WoltLab. We are building a web application suite for you to build online communities.
Derick Rethans 0:59
Thanks for coming on this morning. What is the problem that you're trying to solve with this RFC?
Tim Düsterhus 1:05
If everything is going well, we don't need this RFC. But errors can and will happen and our application might encounter some exceptional situation, maybe some request to an external service fails. And so the application throws an error, this exception will bubble up a stack trace and either be caught, or go into a global exception handler. And then basically, in both cases, the exception will be logged into the error log. If it can be handled, we want to make the admin side aware of the issues so they can maybe fix their networking. If it is unable to be handled because of a programming error, we need to log it as well to fix the bug. In our case, we have the exception in the error log. And what happens next? In our case, we have many, many lay person administrators that run a community for their hobby, they're not really programmers with no technical expertise. And we also have a strong customers help customers environment. What do those customers do? They grab their error log and post it within our forums in public. Now in our forum, we have the error log with the full stack trace, including all sensitive values, maybe user passwords, if the Authentication Service failed, or something else, that should not really happen. In our case, it's lay person administrators. But I'm also seeing that experienced developers can make this mistake. I am triaging issues with an open source software written in C. And I've sometimes seeing system administrators posting their full core dump, including their TLS certificates there, and they don't really realize what they have just done. That's really an issue that affects laypersons, and professional administrators the same. In our case, our application attempts to strip those sensitive information from this backtrace. We have a custom exception handler that scans the full stack face, tries to match up class names and method names e.g. the PDO constructor to scrub the database password. And now recently, we have extended this stripping to also strip anything from parameters that are called password, secret, or something like that. That mostly works well. But in any case, this exception handler will miss sensitive information because it needs to basically guess what parameters are sensitive values and which don't. And also our exception handler grew very complex because to match up those parameters, it needs to use reflection. And any failures within the exception handler cannot really be recovered from, if the exception handler fails, you're out of luck.
Derick Rethans 3:51
Quite a few things to think of to make sure that you're not sharing any secrets. And I certainly have seen almost doing this myself. We now know what the problem is. How is this RFC proposing to fix this?
Tim Düsterhus 4:03
Primarily, we want to propose a standardized way for applications or libraries to indicate which parameters hold sensitive values. Our custom exception handler uses reflection as we said before, and it only matches up the parameter's names, but we also have this attribute I am proposing, SensitiveParameter within our application itself. Any parameter names that are not definitely sensitive can be attributed with this attribute. But this only works within our software, but not with any third party libraries we are using, e.g. for encryption or whatever there is. Primarily we want to propose a standardized way an attribute that is in PHP core, anyone can use that and everyone knows what this attribute means. Secondarily, the RFC is proposing a default implementation to keep the exception handler simple. As I said before, we are using reflection. This is very complex, it does not work with the require_once or include_once family, because that are not functions. We need to handle this case to not try to attempt to reflect on those non functions when redacting any parameters. This is complex. And we want to simplify that.
Derick Rethans 5:20
From what I understand this is then a way to make sure that there's a standardized method for marking arguments as being sensitive. And because this is that now standardized, only one solution to the problem has to be found right?
Tim Düsterhus 5:34
Basically, not every library is using their own attributes, possibly, or we can match parameter names that are not like password, secret, but it can be documented: hey, if you are using sensitive parameters, you should put this attribute and then those exception handlers will be aware that this attribute is sensitive and can strip it, or in case of the RFC PHP itself, will already strip those parameters from the stack trace.
Derick Rethans 6:04
You're suggesting that PHP standard way of showing stack traces also takes care of the sensitive parameter here?
Tim Düsterhus 6:11
Yes, exactly.
Derick Rethans 6:13
Which internal PHP functions are likely to get this attribute?
Tim Düsterhus 6:16
Basically anything with a parameter called password or secret, as I said before, examples include PDO's constructor, the database password will be in there and possibly also the user name or host name, which might be considered sensitive. But the password is the most important thing I have on my list. ldap_bind, which possibly includes user passwords; the password_hash function; possibly various OpenSSL functions. One will need to look and this list can be extended in the future as well, if someone realizes we missed anything.
Derick Rethans 6:55
Now, I know sometimes that there's a problem where an application connects to the wrong server with PDO. And as you say, the host name was also in this PDO constructor, would it not then make debugging that specific case harder because the hostname would also be redacted from the stack traces?
Tim Düsterhus 7:14
The attribute I am proposing as the parameter attribute, each parameter can be sensitive or non sensitive. We would need to decide whether we consider the hostname sensitive or not. It usually is not. So I would not put the attribute on the host name, or on the DSN string in the first parameter. The password definitely is sensitive. And the username possibly is a grey area. By default, I probably would not put the attribute there. But this is something that needs to be discussed in the greater community possibly.
Derick Rethans 7:47
I saw in the RFC that when you request a stack trace in PHP with get back trace or whatever the name of this function is, is that the sensitive parameters are being replaced by an object of the class SensitiveParameter. Why did you pick that instead of just a string, saying something like "redacted".
Tim Düsterhus 8:06
We cannot force users to put the attribute only on parameters that take strings. If we use a redacted string we might violate the type hint. If a function takes some key pair class, or an option of a key pair class, this usually is a sensitive attribute, we cannot simply put a string there. We can but then we would violate the typing. And as we violate the typing in at least some of the cases, we can also violate it in all of the cases and then make it very clear that this parameter was redacted and not a real value that just looks like a string "redacted". Exception handlers would be able to use an instanceof SensitiveParameter check to possibly make it more user friendly when they render the stack trace. When you using an GUI to handle your exceptions as such a Sentry can show some placeholder instead of pretending it's a real string in there.
Derick Rethans 8:07
PHP Internals News: Episode 96: User Defined Operator Overloads
Thursday, December 16th 2021, 09:24 GMT
London, UK
In this episode of "PHP Internals News" I chat with Jordan LeDoux (GitHub) about the "User Defined Operator Overloads" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 96. Today I'm talking with Jordan, about a user defined operator overloads RFC that he's proposing. Jordan, would you please introduce yourself?
Jordan LeDoux 0:33
My name is Jordan LeDoux. I've been working in PHP for quite a while now. This is the second time I have ventured to propose an RFC.
Derick Rethans 0:44
What was the first one?
Jordan LeDoux 0:45
The first one was the "never for parameter types", which was much more exploratory. And we talked about it a little bit. And it generated a lot of good discussion that contributed to kind of the idea formation, which was what I hope to get out of it.
Derick Rethans 1:01
Okay, but that didn't end up making it into a PHP release. As far as I understand, right?
Jordan LeDoux 1:07
No, I withdrew it actually, it was clear that the better way to approach the problem it was trying to solve was with a much more comprehensive solution. That particular solution was something that only required a seven line change to the engine. So I wanted to see if it was something people were okay with, or thought was a decent idea for that particular problem, much more comprehensive, like template classes, or something like that is probably the better route to go.
Derick Rethans 1:35
Well, I think the RFC that we're talking about today, is going to require quite a bit more than seven lines of code?
Jordan LeDoux 1:41
Quite a bit more. Yeah.
Derick Rethans 1:42
So what is this RFC that we're talking about today?
Jordan LeDoux 1:45
Well, user defined operator overloads is a way for PHP developers to define the ways in which objects interact with specific operators. So for instance, the plus operator, the plus sign. It's a way for those objects to kind of define their own logic as far as how that's handled, which right now, as of PHP 8.0, those were all switched to type errors. So it's not possible currently to write any code that doesn't result in a fatal error, where objects are used with operators.
Derick Rethans 2:25
Usually, I ask about every RFC, what problem are you trying to solve this? So what problem are you trying to solve this RFC?
Jordan LeDoux 2:31
The biggest problem that this solves is that objects contain, so objects in most programs represent a value or multiple values that have a program context. That's the most powerful thing about objects is they're contextual, and they understand the state, they understand what state the object is in, and sometimes even what state the whole program is in. And that's necessary for a lot of things. Like for instance, if you're tracking a distance, you know, you might measure that meters, and that would have a number you might have 30 meters of distance, but it also has a unit of meters. You could just represent that as an int. And then the program just knows internally, hey this is always in meters. But if you need to convert that to a different unit, then that becomes: Okay, well, now I need a special case some things, or I need a function just for converting, and I need to remember which unit my number is in. In a lot of cases, you handle that with objects because objects understand state, and they understand state transitions, which is what a lot of methods are about; transitioning the state of the object from one state to another. Operators are also about state transitions. And they're about very specific kinds of state transitions. It's natural in a lot of ways to think that you, you should be able to define how those two things interact. But currently, it's just not possible within PHP.
Derick Rethans 4:00
Well, does them this magic operator overloading?
Jordan LeDoux 4:04
It allows PHP developers to define an implementation logic, which is much like you define a function body that describes how does this object interact with this operator. That's essentially it. There's a lot of other details as to how it does that and what are the restrictions, but that's really the core of the idea.
Derick Rethans 4:26
And in what kind of situations would you use that?
Jordan LeDoux 4:28
A lot of them are situations where you're doing very complicated mathematics, or scientific computing or machine learning or things of that nature, where you are going to routinely encounter numbers that have state to them or that have multiple dimensions to them. So for instance, vector mathematics is one where the way that vectors interact with a lot of the operators that we're familiar with, like the multiplication sign is very different than how the number five interacts with the multiplication sign. Complex numbers is another one, you know, to multiply two complex numbers together, you have to treat it like a polynomial where you're multiplying it with the FOIL method: first, outside, inside, last. You know, there's a lot of those sorts of circumstances. But it also could potentially be very useful for some things that are not really mathematical but more quality of life for PHP developers. For instance, scalar objects is something that a lot of developers in PHP have, you know, wanted for a while. It's a thing that's a little more difficult to pin down, how exactly would you go about doing this within the engine, and it's a thing that the engine would kind of have to be very opinionated about by its nature. PHP developers can't provide their own scalar objects. And the main reason for this is that scalars interact with operators and objects can't. So simply allowing PHP developers to define a way for objects to interact with operators would allow user land to develop their own scalar object replacements. It wouldn't make every scalar that object; scalar objects within the engine still has, it's a separate feature. And it's still a thing that would be desirable, probably to a lot of people. But it gets quite a bit of the way there.
Derick Rethans 6:20
It is always interesting that people come up with the example of complex numbers, because I'm not sure how useful that is in a PHP user land context. And then beyond the scalars, I then sometimes struggle to see where this could be used. With the only exception is probably doing calculations with money related issues. The moment you bring up operator overloading, you'll also get people to say that this is going to get abused. Examples of that, in my opinion at least, is where in C++ you have like the << operator to put things into the stream and stuff like that. What answer would you have to kind of comments?
Jordan LeDoux 6:58
Abuse of operator overloads to do things that can create unmaintainable code, because that's really the concern for developers is, does a language feature promote code that's difficult to maintain, that's difficult to understand, that's difficult to follow, and develop, and you know, work with. The RFC, the way that I've gone about this implementation, has had that in mind, because I also have experienced that. This is not a thing where I coming down from the academic high tower with, you know, whatever my my concept of this is, and no, no real world experience with these things. I share a lot of those concerns. Actually, I think this is a very useful feature that has a lot of applications I've encountered. I have had to work with matrix maths, I have had to work with complex numbers, I've had to work with arbitrary precision numbers, and all of those situations would have been served so much better by having operator overloads. I was fighting with the language the entire time, I was trying to do those. But I understand you know, in a lot of web applications, those are not common problems to encounter. My experience of that isn't typical. The thing about the way that it's done is it tries to head off a lot of the ways that it could be misused. An example of that is that the RFC requires typing of the parameters. You can't define an operator method and leave the types blank. If you do, then you get a fatal error during compile. It tells you you must explicitly define a type. And the reason for this is that blank types are assumed to be mixed. So it's the same as putting mixed for the type within the engine. And a mixed type says I can take anything, it doesn't matter w
PHP Internals News: Episode 95: PHP 8.1 Celebrations
Thursday, November 25th 2021, 09:23 GMT
London, UK
In this episode of "PHP Internals News" we're looking back at all the RFCs that we discussed on this podcast for PHP 8.1. In their own words, the RFC authors explain what these features are, with your host interjecting his own comments on the state of affairs.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language.
Derick Rethans 0:23
This is episode 95. I've been absent on the podcast for the last few months due to other commitments. It takes approximately four hours to make each episode. And I can now unfortunately not really justify spending the time to work on it. I have yet to decide whether I will continue with it next year to bring you all the exciting development news for PHP 8.2.
Derick Rethans 0:44
However, back to today, PHP eight one is going to be released today, November 25. In this episode, I'll look back at the previous episodes this year to highlight a new features that are being introduced in PHP 8.1. I am not revisiting the proposals that did not end up making it into PHP 8.1 feature two features I will let my original interview speak. I think you will hear Nikita Popov a lot as he's been so prolific, proposing and implementing many of the features of this new release. However, in the first episode of the year, I spoke with Larry about enumerations, which he was proposing together with Ilija Tovilo. I asked him what enumerations are.
Larry Garfield 1:26
Enumerations, or enums, are a feature of a lot of programming languages. What they look like varies a lot depending on the language, but the basic concept is creating a type that has a fixed finite set of possible values. The classic example is booleans. Boolean is a type that has two and only two possible values true and false. Enumerations are way to let you define your own types like that, to say this type has two values Sort Ascending or Sort Descending. This type has four values for the four different card suits, and a standard card deck. Or a user can be in one of four states pending, approved, cancelled or active. And so those are the four possible values that this variable type can have. What that looks like varies widely depending on the language. In a language like C or C++, it's just a thin layer on top of integer constants, which means they get compiled away to introduce at compile time, and they don't actually do all that much they're a little bit to help for reading. On the other end of the spectrum, you have languages like rust or Swift, where enumerations are a robust, advanced data type and data construct of their own. That also supports algebraic data types. We'll get into that a bit more later. And is a core part of how a lot of the system actually works in practice, and a lot of other languages are somewhere in the middle. Our goal with this RFC is to give PHP more towards the advanced end of enumerations. Because there are perfectly good use cases for it, so let's not cheap out on it.
Derick Rethans 3:14
In the next episode, I spoke with Aaron Piotrowski about another big new feature: fibres.
Aaron Piotrowski 3:20
A few other languages already have Fibers like Ruby. And they're sort of similar to threads in that they contain a separate call stack and a separate memory stack. But they differ from threads in that they exist only within a single process and that they have to be switched to cooperatively by that process rather than pre-emptively by the OS like threads. And so the main motivation behind wanting to add this feature is to make asynchronous programming in PHP much easier and eliminate the distinction that usually exists between async code that has these promises and synchronous code that we're all used to.
Derick Rethans 4:03
I also asked Aaron about small PHP I actually have a slightly related question that pops into my head as like. There's also something called Swoole PHP, which does something similar but from what I understand actually allows things to run in threats. How would you compare these two frameworks or approaches is probably the better word?
Aaron Piotrowski 4:25
Swoole is they try and be the Swiss Army Knife in a lot of ways where they provide tools to do just about everything. And they provide a lot of opinionated API's for things that in this case, I'm trying to provide just the lowest level just the only the very necessary tools that would be required in core to implement Fibers.
Derick Rethans 4:48
Although I discussed several deprecations from Nikita and the last year, I only want to focus on the new features. In episode 76. I spoke with him about array unpacking, after talking about changes to Null in internal functions.
Nikita Popov 5:01
The old background is set we have unpacking calls. If you have the arguments for the call in an array, then you write the free dots and the array is unpacked intellectual arguments. Now what this RFC is about is to do same change for array unpacking, so allow you to also use string keys.
Derick Rethans 5:24
In another episode, I spoke with David Gebler on a more specific addition of a new function fsync. David explains the reason why he wants to add this to PHP.
David Gebler 5:34
It's an interesting question, I suppose in one sense, I've always felt that the absence of fsync and some interface to fsync is provided by most other high level languages has always been something of an oversight in PHP. But the other reason was that it was an exercise for me in familiarizing myself with PHP core getting to learn the source code. And it's a very small contribution, but it's one that I feel is potentially useful. And it was easy for me to do as a learning exercise.
Derick Rethans 5:58
And that is how things are added to PHP sometimes, to learn something new and add something useful at the same time. After discussing the move of the PHP documentation to GIT an episode 78, in Episode 79, I spoke with Nikita about his new in initializers RFC. He says:
Nikita Popov 6:15
So my addition is a very small one, actually, my own will, I'm only allowing a single new thing and that's using new. So you can use new whatever as a parameter default, property default, and so on.
Derick Rethans 6:29
The addition of this change also makes it possible to use nested attributes. Nikita explains:
Nikita Popov 6:34
I have to be honest, I didn't think about attributes at all, when writing this proposal. What I had in mind is mainly parameter defaults and property defaults. But yeah, attribute arguments also use the same mechanism and are under the same limitations. So now you can use new as an attribute argument. And this can be used to effectively nest attributes.
Derick Rethans 6:59
Static Analysis tools are used more and more with PHP, and I spoke to the authors of the two main tools, Matt Brown, of Psalm, and Ondrej Mirtes of PHPStan. They propose to get her to add a new return type called noreturn. I asked him what it does and what it is used for.
Ondrej Mirtes 7:14
Right now the PHP community most likely waits for someone to implement generics and intersection types, which are also widely adopted in PHP docs. But there's also noreturn, a little bit more subtle concept that would also benefit from being in the language. It marks functions and methods that always throw an exception. Or always exit or enter an infinite loop. Calling such function or method guarantees that nothing will be executed after it. This is useful for static analysis, because we can use it for type inference.
Derick Rethans 7:49
Beyond syntax, each new version of PHP also adds new functions and classes. We already touched on the new fsync function, but Mel Dafort proposed to out the IntlDatePatternGenerator class to help with formatting dates according to specific locales in a more specific way. She explains:
Mel Dafert 8:07
Currently, PHP exposes the ability for locale dependent date formatting with the IntlDateFormat class, it says basically only three options for the format long, medium and short. These options are not flexible in enough in some cases, however, for example, the most common German format is de dot numerical month dot long version of the year. However, neither the medium nor the short version provide and they use either the long version of the month or a short versi
PHP Internals News: Episode 94: Unwrap Reference After Foreach
Thursday, August 26th 2021, 09:22 BST
London, UK
In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "First Class Callable Syntax" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 94. Today I'm talking with Nikita Popov about the unwrap reference after foreach RFC that he's proposing. Nikita, would you please introduce yourself?
Nikita Popov 0:33
Hi, Derick. I'm Nikita and I work at JetBrains on PHP core development.
Derick Rethans 0:38
So no changes compared to the last time.
Nikita Popov 0:41
Yes, at the time before that.
Derick Rethans 0:43
So what is the problem that is RFC is going to solve?
Nikita Popov 0:46
Well, it's really a very minor thing. I think it's a relatively well known problem for the more experienced PHP programmers. It's like a classic example, you have a foreach loop by reference. So foreach array as value by reference, and then you do a second loop after that, foreach array as value at the same it's by value. So without the reference sign. The result of that is that your last two array elements are going to be the same, which is kind of unexpected. If you're not familiar with how references in PHP work and scoping in PHP works. So I think it's worth explaining what's going on there.
Derick Rethans 1:27
Can you quickly explain the scoping or rather the lack of it, I suppose?
Nikita Popov 1:31
Yeah, it's really the lack of PHP really only has function scoping. So if you have a foreach array as value, then the value variable is going to stay alive, even after the foreach loop. And usually, that won't make much of a difference. So you will just have like reference to the last element of the array, might even be useful for some cases, you know, before we added the array, I think, array_key_last function. If the last element now is a reference, so if you have a reference to the last element, then you're write into that variable is also going to modify the last element of the array. So if you now have a second foreach loop, using the same variable, that's actually not just modifying that variable, but it's also always modifying the last element of the array.
Derick Rethans 2:15
Okay, just to clarify, it isn't necessarily the last element in the foreach loop. It's the last one that's been assigned to?
Nikita Popov 2:22
Yeah, that's, that's true.
Derick Rethans 2:24
Is this not something that people actually use for some useful reasons?
Nikita Popov 2:28
As mentioned before, technically, you could use it to get a reference to the last element and then modify the last element outside the foreach loop. I don't think this is a particularly common use case. But I'm sure people have used in here there. This is a use case we would break with the proposed RFC.
Derick Rethans 2:47
I think it is one I have used in the past, it's probably not how I would do it now. But I'm pretty sure I have some point in the past. What are you proposing to change with this RFC?
Nikita Popov 2:57
The change is pretty simple. And that's to unwrap or to break the reference after the loop. You will still have like after the loop, the variable will still contain the value of the last element, or of the last like visited element, but it will no longer be a reference to it. If you write into the variable, it will not modify the original array. And if you have a second loop that writes into the variable that also doesn't modify the original error any more.
Derick Rethans 3:25
At which point and how is this reference broken?
Nikita Popov 3:29
It's at the end of the foreach loop, or as you say, if you break out too early, then of course, it would also get broken. So it's referenced inside the foreach loop and stops being referenced outside the loop.
Derick Rethans 3:41
And that would happen also, if I would use a goto for example?
Nikita Popov 3:45
Oh, that that's a trick question, actually, yes, it should happen. But now that you have mentioned it, I think my current implementation does not handle that particular case, I will have to double check it. But that should happen, yes.
Derick Rethans 4:00
It's good to know that you've thought about it then.
Nikita Popov 4:02
Well, I didn't think about it. Because I mean, I guess I can mention it here, the way this works is that well, at the end of the foreach loop, we have like an instruction that frees the loop variable. And I can just add an additional one that breaks reference. But if you use things like goto or multi level breaks, or something like that, then we insert these clean-up instructions before the jump. We have to make sure to actually insert the reference breaking instruction there as well. So it's like not automatically handled.
Derick Rethans 4:38
Is this going to be a separate instruction or as we tend to call them opcodes?
Nikita Popov 4:43
I'm using a separate one, but one could run it as a flag into the instruction that frees the loop variable, but I think it's cleaner to have a separate instruction for it. Like technically one could optimize it away in some cases, like I wouldn't bother but it's like semantically a different thing.
Derick Rethans 5:01
I think it'd be nicer result, because it makes it easier to visualize what's happening, right?
Nikita Popov 5:06
Yeah, it is.
Derick Rethans 5:07
Did you actually check whether some code uses this construct?
Nikita Popov 5:10
I have to admit, I tried checking it using a very basic approach, just look at foreach loops by reference. And then if the variable is used after that. But that kind of primitive approach has way too many false positives, for example, you have a foreach loop inside, and if, and then the variable is reused inside an else. So it like wouldn't flow from the if into the else. So you would have to do some kind of more sophisticated control flow analysis. It's something that can be done, but I didn't bother doing it for a one off backwards compatibility check. So I don't have any hard data on how much code is actually using something like this.
Derick Rethans 5:51
So this is where I'm a little bit on the fence about this change, because it is changing behaviour, that's going to be pretty hard to figure out what is actually going to affect your codebase.
Nikita Popov 6:01
It should be possible to very reliably detect that. It's just something you have to actually implement. But you're right now there is no easy way to check that.
Derick Rethans 6:13
It's something that static analysers could probably have a look at.
Nikita Popov 6:16
Yeah, expect that maybe Psalm or PHPStan, something like that will be easier to implement, because they already have control flow information.
Derick Rethans 6:23
You don't really know how impactful this, which is, in my opinion, a bit of the scary bit. How important do you think you'll find it to have this RFC going through and implemented?
Nikita Popov 6:33
I don't think it's super important. It's mostly like, small quality of life fix for newer developers . People who have already encountered this issue once won't forget about it again. In fact, it's somewhat common recommendation that you should always unset the loop variable after a foreach by reference loop. So I've seen that as like a policy some people use, that could be avoided. So yeah, I don't think it's a critical feature, just a small improvement.
Derick Rethans 7:08
Would it be an alternative idea to instead deprecate the foreach by reference?
PHP Internals News: Episode 93: Never For Parameter Types
Thursday, August 19th 2021, 09:21 BST
London, UK
In this episode of "PHP Internals News" I chat with Jordan LeDoux (GitHub) about the "Never For Parameter Types" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 93. It's been quiet over the last month, so it didn't really have a chance to talk about upcoming RFCs mostly because there were none. However, PHP eight one's feature freeze has happened now, a new RFCs are being targeted for the next version of PHP eight two. Today I'm talking with Jordan LeDoux, about the Never For Parameter Types RFC, the first one targeting this upcoming PHP version. Jordan, would you please introduce yourself?
Jordan LeDoux 0:50
Certainly. And thanks for having me. My name is Jordan. I've worked as a developer for about 15 years now. Most of my career has been spent working in PHP. Although professionally, I've had experience working in C#, Python, TypeScript, mostly in the form of JavaScript, but a little bit of Node and, you know, a variety of other languages that I haven't spent enough time in to really be proficient in any real way. But recently, I decided to do something that I have thought about doing for many years, but never actually jumped into which is exploring the PHP engine itself and how I could possibly contribute to it.
Derick Rethans 1:32
And here we are, but your first our thing.
Jordan LeDoux 1:35
Yeah, it's exciting.
Derick Rethans 1:36
What is this RFC about, what does it propose?
Jordan LeDoux 1:39
Well, this RFC proposes allowing the never type, which was added in 8.1 as a return value, to parameters for functions and methods on objects. The main idea behind that is that when never was proposed as a return type, it was meant to signal that the function would never return. Not that it returns void, which of course, void signifies which is returning no value or returning, returning without any specified information. And never return signifies that the function will never return, which is a concept that exists in many other languages. And for that purpose in other languages, what's usually used is something called a bottom type. And that's what never ended up being. And I'm proposing that we extend the use of that bottom type to other areas where the type may be helpful.
Derick Rethans 2:38
So a bottom type, that might be a new term for many people, it will certainly for me when I looked at the never RFC for as return types. Can you sort of explain what a bottom type is, especially thinking about object oriented theory with something that we'd like to call the Liskov Substitution Principle? And also, how does it apply to argument types?
Jordan LeDoux 2:59
Let's start with the Liskov Substitution. The general idea behind Liskov Substitution is that if A is a subtype of B, then anywhere that A exists, you should be able to substitute B. It has to do with when you have a class hierarchy in in an object oriented language, that that class hierarchy guarantees certain things about substitutionality, like whether or not something can be substituted for something else. That affects language design in ways that a lot of programmers are kind of intuitively familiar with, but maybe not familiar with the theory and the ideas behind it more concretely. But LSP is the principle in SOLID, that's the L and SOLID. And it represents a portion of the whole idea of object oriented programming in PHP. Part of being able to substitute one object for another, based on their class hierarchy, and what they implement, and what they provide is there part of their ability to be substituted is whether or not they can fulfil the same kind of contractual requirements of typing. And with Liskov, that means that preconditions can never be strengthened, and post conditions can never be weakened. So a precondition would be a parameter type requirement. If you require that a parameters accepts an object, for instance, in PHP, you can't strengthen that requirement beyond just any object to a particular object. But you can weaken it from an object to an object or an integer with with unions. That's an example of the precondition side of it. The post condition side of it is that you can't, you can't weaken it. So if you have have, you know, if you have a return type of int, you can't have an inherited implementation return int or float, because that broadens the possible return types. They go in opposite directions. And one of them is covariance and one of them is contravariance.
Derick Rethans 5:18
I can never remember which one is which.
Jordan LeDoux 5:21
Yeah, basically contravariance go up the tree and covariance go down the tree. If you're thinking about widening, or sorry, narrowing. If you're thinking about narrowing, then covariance go down the implementation tree and contravariance go up the implementation tree.
Derick Rethans 5:40
Okay, so how does the bottom type fit in here?
Jordan LeDoux 5:43
The bottom type in any type system represents like the base type that all types originate from. And the best way in my mind to think about it is kind of just integer math. It's a thing that every programmer is going to be familiar with. And it fits, it fits all right. So you can think of the bottom type as zero integers, a lot of people would think of null as zero if they're thinking about a type system, but null is more like negative one. It's like the entire negative side of the integer system. We could say that the string type is one, and the integer type is two. And the float type is three, and maybe int or float, the union of them is five, which would be the two numbers added together. And if you describe type systems this way, then you can say, hey, if I take any type and add the value that represents the other type, then I get my result type. So if I take zero, the bottom type, and I add one, the string type, what I end up with is one, still the string type. So the bottom type is whatever type system or whatever, whatever type, when you add any other type to it, you get the type you added to it and nothing else, you just get your original thing. That's why it's called the union identity for the type system. The top type in PHP is mixed. And that's the opposite side of it. It's just like zero is the additive identity. One is the multiplicative identity. If you multiply anything by one, you're going to get what you originally had. And if you add anything to zero, you'll get what you originally had. So mixed ends up being, or the top type in general, ends up being the intersection identity, and the bottom type, or never, in PHP's case, ends up being the union identity. And all this is like deep type theory, but most programmers don't have to interact with it. It's more something that affects language design usually.
Derick Rethans 7:48
Could you think of mixed as being infinity?
Jordan LeDoux 7:50
That's actually with my with my crude integer analogy, yeah, it would be like all types, all possible types are added together.
Derick Rethans 7:59
That makes sense then. Okay, so we have explained what the bottom type is, but, and never being the bottom type. So why is it useful to use the bottom type, or never, as this RFC proposes, as a method argument type for parameters?
Jordan LeDoux 8:14
The largest benefit has to do with what we were talking about when it comes to covariance versus contravariance. You know, can you strengthen the requirements? Or can you weaken the requirements? When you inherit a method in a system that preserves Liskov Substitution, the parameters can be widened, they can accept more things. If I had an interface that said, it has one parameter, and that parameter is typed as int, then in any implementation, I could say, okay, but actually, the parameter type is int or float, I'm going to accept both. And I could do that in the implementation. But I would have to accept int, because that's part of the contract. That's part of the interface. So I have to accept whatever type is in the interface. I can just additionally add things on top of that. If I make my original definition, my root definition, the bottom type, then I can add any type to it. And I will just get that type. From never, if I had an interface with, you know, a method foo, and it has one argument, and that argument is typed never, I can re-type that argument as int, or I could re-type that argument as string, and all of them would be valid inheritances.
Derick Rethans 9:38
So i
PHP Internals News: Episode 92: First Class Callable Syntax
Thursday, July 22nd 2021, 09:20 BST
London, UK
In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "First Class Callable Syntax" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 92. Today I'm talking with Nikita Popov about a first class callable syntax RFC that he's proposing together with Joe Watkins. Nikita, would you please introduce yourself?
Nikita Popov 0:36
Hi, Derick. I'm Nikita and I am still working at JetBrains. And still working on PHP core development.
Derick Rethans 0:43
Just like about half an hour ago when we recorded an earlier episode.
Nikita Popov 0:47
Exactly.
Derick Rethans 0:48
This RFC has no relation to read only properties. What is the first class callable syntax RFC about?
Nikita Popov 0:55
The context here is that PHP has the callable syntax based on literals, which is that if you just use a plain string, it's interpreted as a function name, and an array where the first element is an object, and the second one is a method name, that's methods. Or the first element is the class name, and the second one is method name, that's a static method.
Derick Rethans 1:17
I would consider this concept a bit of a hack, especially the the one with the arrays, and I reckon you feel similar and hence this RFC?
Nikita Popov 1:27
Yes, I do. So the current callable syntax has a couple of issues. I think the core issue is that it's not really analysable. So if you see this kind of like array with two strings inside it, it could just be an array with two strings, you don't know if that's supposed to actually be a static method reference. If you look at the context of where it is used, you might be able to figure out that actually, this is a callable. And like in your IDE, if you rename this method, then this array should also be this array element will also be renamed. But there's like a lot of complex reasoning that the static analyser has to perform. That's one side of the issue. The second one is that callables are not scope independent. For example, if you have a private method, then like at the point where you create your callable, like as an array, it might be callable there, but then you pass it to some other function. And that's in a different scope. And suddenly that method is not callable there. So this is a general issue with both like this callable syntax based on arrays, and also the callable type. It's a callable at exactly this point, not callable at a later point. This is what the new syntax essentially addresses. So it provides a syntax that like clearly indicates that yes, this really is a callable, and it performs the callable callability check at the point where it's created, and also binds the scope at that time. So if you pass it to a different function in a different scope, it still remains callable.
Derick Rethans 3:01
And it's guaranteed to always be callable.
Nikita Popov 3:03
Yeah, exactly.
Derick Rethans 3:04
What does the syntax like?
Nikita Popov 3:06
The syntax is the funny bit. As a bit of context. This proposal was created as an alternative or as a subset of the partial function application RFC.
Derick Rethans 3:17
That is just as hard to pronounce as first class callable syntax RFC.
Nikita Popov 3:21
Yes, that's why we say PFA. The PFA RFC has a more general feature. It also allows you to create a reference to a callable as a side effect. But more generally, it allows you to also bind some of the arguments to a fixed value. And has like finer control over for example, you can create a callable that has three required parameters, by passing three question mark arguments. While the new syntax only allows you to use the signature of the original function. But the syntax between both of those is compatible. So the new RFC is a subset of PFA. And that's why it uses the syntax where you do a normal function call, but then pass three dots or an ellipsis as arguments.
Derick Rethans 4:08
Instead of passing the function's or method's normal arguments, you use the three dots.
Nikita Popov 4:14
I think like the way to think about the syntax is that this is similar to like a variadic argument, or to the argument unpacking syntax, just that the arguments haven't yet been provided, they will be provided during the actual call. But I think the syntax was definitely the most contentious bit in the discussion of the RFC. I think this is mainly related to the fact that if you the see this code snippet, it looks a bit like, like the example code where the arguments haven't been filled in. While now this is like actual syntax.
Derick Rethans 4:44
I'm sure there's quite a few tutorials out there explaining how PHP works by using dot dot dot. That is not something you can avoid.
Nikita Popov 4:54
Well, we can avoid it, but it's fairly tricky question. I mean, the reason for this dot dot dot syntax, on one hand, this the compatibility with partial functions. I mean, the PFA, RFC has recently been declined. But in the future, we could extend the current syntax to full partial functions. And we would not end up with two different ways. So that's one benefit of the syntax. But the other part is that PHP has different symbol tables for different kinds of symbols. People often ask, why can't you just write like strlen as a plain name, not inside a string, and have that be treated as a reference to this function? And the answer to that is that we can't do that because you can't have a constant that's called strlen. Normally, that would be reference to constant and the same actually applies to all other callable types as well. So if you have something like methods, like object or method name, that would right now be interpreted as a property access. And for static methods, it will be interpreted as a as a class constant access. So we have this ambiguity here. Even if we add an additional symbol to this, for example, like for classes, we have the syntax, class name, and then scope operator class, that gives you the class name. We could do something like strlen, scope operator function, or fn, or whatever, and have that return the callable. That would work, but it also has some ambiguities. For example, if you have something like object, arrow methods, and then scope operator fn, you have this ambiguity. Is this referencing the method of that name? Or is it referencing a callable stored inside the property of that name? This is like fundamentally ambiguous. The way we would resolve it is we will just say that this index is only usable with real simple, so it will always refer to a method, and you couldn't use the syntax to convert the callable stored in a property into a proper callable. I'm actually not sure how I should distinguish these two concepts, because we have the existing callable, strings and arrays, and the first class callables, which are really closure objects.
Derick Rethans 7:11
Which actually sort of brings me to the next question which just popped in my head, which is: Does this first class scalable syntax, what is returned as return a closure or an existing callable type as we have now, with a callable type being a single string, or this array syntax that we now use.
Nikita Popov 7:28
The syntax returns a closure. Actually, the syntax works essentially the same way as the closureFromCallable method. And we do need to return a closure otherwise, we don't get this behaviour where the scope is bound at the time where the callable is created, rather than called. I think maybe going forward, I would generally recommend that people use a closure type, instead of a callable type in type declarations. I mean, you already cannot use callable for property types. Exactly due to this problem that callability is context dependent. While we only forbid it in property types, the same general problem also exists for argument and return types. And especially with the new syntax being introduced here, I think it's best to use closure instead of callable in the future.
Derick Rethans 8:18
Does that sort of mean that first class scalable syntax is syntactic sugar? Or does it do more than the closureFromCallable method?
Nikita Popov 8:27
PHP Internals News: Episode 91: is_literal
Thursday, July 15th 2021, 09:19 BST
London, UK
In this episode of "PHP Internals News" I chat with Craig Francis (Twitter, GitHub, Website), and Joe Watkins (Twitter, GitHub, Website) about the "is_literal" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 91. Today I'm talking with Craig Francis and Joe Watkins, talking about the is_literal RFC that they have been proposing. Craig, would you please introduce yourself?
Craig Francis 0:34
Hi, I'm Craig Francis. I've been a PHP developer for about 20 years, doing code auditing, pentesting, training. And I'm also the co-lead for the Bristol chapter of OWASP, which is the open web application security project.
Derick Rethans 0:48
Very well. And Joe, will you introduce yourself as well, please?
Joe Watkins 0:51
Hi, everyone. I'm Joe, the same Joe from last time.
Derick Rethans 0:56
Well, it's good to have you back, Joe, and welcome to the podcast Craig. Let's dive straight in. What is the problem that this proposal's trying to resolve?
Craig Francis 1:05
So we try to address the problem where injection vulnerabilities are being introduced by developers. When they use libraries incorrectly, we will have people using the libraries, but they still introduce injection vulnerabilities because they use it incorrectly.
Derick Rethans 1:17
What is this RFC proposing?
Craig Francis 1:19
We're providing a function for libraries to easily check that certain strings have been written by the developer. It's an idea developed by Christoph Kern in 2016. There is a link in the video, and the Google using this to prevent injection vulnerabilities in their Java and Go libraries. It works because libraries know how to handle these data safely, typically using parameterised queries, or escaping where appropriate, but they still require certain values to be written by the developer. So for example, when using a query a database, the developer might need to write a complex WHERE clause or maybe they're using functions like datediff, round, if null, although obviously, this function could be used by developers themselves if they want to, but the primary purpose is for the library to check these values.
Derick Rethans 2:05
That is a method of doing it. What is this RFC adding to PHP itself?
Craig Francis 2:09
It just simply provides a function which just returns true or false if the variable is a literal, and that's basically a string that was written by the developer. It's a bit like if you did is_int or is_string, it's just a different way of just sort of saying, has this variable been written by the developer?
Derick Rethans 2:28
Is that basically it?
Craig Francis 2:30
That's it? Yeah.
Joe Watkins 2:32
It would also return true for variables that are the result of concatenation of other variables that would pass the is literal check. Now, this differs from Google, because they introduced that at the language level, but not only at the language level, at the idiom level. So that when you open a file that's got queries in PHP, commonly, if they're long, basic concatenation is used to build the query and format it in the file so that it's readable. So that it wouldn't really be very useful if those queries that you see everywhere in stuff like PHPMyAdmin, and WordPress, and Drupal and just normal code weren't considered literal, just because they're spread over several lines with the concatenation operator. It's strictly not just stuff that's written by the programmer, but also stuff that was written by the programmer or concatenated, with other stuff that was written by the programmer.
Derick Rethans 3:33
Now in the past, we have seen something about adding taint supports to PHP, right? How is this different, or perhaps similar, to taint checking?
Craig Francis 3:44
At the moment today, there is a taint extension, which is something you need to go out your way to install, and actually learn about and how to use. But the main difference is that taint checking goes on the basis of say, this variable is safe or unsafe. And the problem is that it considers anything that had been through an escaping function like html_entities as safe. But of course, the problem is that escaping is difficult. And it's very easy to make mistakes with that. A classic example is if you take a value from a user, an SSH SSH, their homepage URL, if you use HTML encoding, and then put it into the href attribute of a link, that can also result in HTML injection vulnerability, because the escaping is not aware of the context which is used. Because if the evil user put in a JavaScript URL, that is in inline JavaScript, that has created a problem because taint checking would assume that because you use HTML encoding it is safe, and all I'm saying is that is it creates a false sense of security. And by stripping out all that support for escaping, it means that you can focus on libraries doing that work because they know the context, they understand the domain, and we can just keep it a much simpler, and much safer approach.
Derick Rethans 5:02
Would you say that the is_literal feature is mostly aimed at library authors and not individual developers?
Craig Francis 5:09
Yeah, exactly. Because the library authors know what they're doing. They're using well tested code, many eyes over it. The problem libraries have at the moment is that they trust the developer to write things themselves. And unfortunately, developers introduce a lot of injection vulnerabilities with those strings before they even get into the library.
Derick Rethans 5:30
How would a library deal with with strings that aren't literal then?
Craig Francis 5:35
So it really depends on each individual example. And the RFC does include quite a lot of examples of how each one will be dealt with. The classic one is, let's say you're sorting by a column in a database, because if we're dealing with SQL, the field name might come from the user. But that is also quite a risky thing to do if you start including whatever field name the user wrote. So in the RFC, I've created a very simple example where the developer would create an array of fields that you can sort by, and then whatever the user provides, you search through that array, and you pull out the one that you that matches and is fine. And therefore you are pulling out a literal and including into the SQL. To be fair, these ones are quite unique. And each one needs to be dealt with in its own way. But I've yet to find an example where you can't do it with a literal. Having said that, I think Larry Garfield actually gave an example where a content management system changed its database structure. And the way that would work is the library would have to deal with it, they would receive the value for a field, and then that field would be escaped and treated as a field, it understands it as a field, and it will process it as such, then it can include into the SQL, knowing full well that everything else in that SQL is a literal, and then it can just build up SQL in its own way internally.
Derick Rethans 6:58
Okay, talking a little bit about the implementation here. Since PHP seven, we have this concept of interned strings, or maybe even before that actually, I don't quite remember. Which is pretty much a flag on each string and PHP that says, this's been created by the engine, or by coconut. Why would strings have to have an extra flag here to remember that it is created by the programmer?
Joe Watkins 7:21
Well, interned does not mean literal. It's an optimization in the engine, should we use strings. We're free to do whatever we want with that. At the moment, it by happenstance, most interned strings are those written by the programmer. If you think about the sort of strings that are written by the programmer, like a class name, when those things are declared internally, by an extension, or by core code, those things are interned as if they were written by the programmer. They don't mean literal, we're free to use interned strings for whatever we want. For example, a while ago, someone suggested that we should intern keys while JSON decoding or unserializing. It didn't happen, but it could happen. And then we'd have the problem of, well, how do we separate out
PHP Internals News: Episode 90: Read Only Properties
Thursday, July 8th 2021, 09:18 BST
London, UK
In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Read Only Properties" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 90. Today I'm talking with Nikita Popov about the read only properties version two RFC that he's proposing. Nikita, would you please introduce yourself?
Nikita Popov 0:33
Hi, Derick. I'm Nikita and I do PHP core development work by JetBrains.
Derick Rethans 0:39
What does this RFC proposing?
Nikita Popov 0:41
This RFC is proposing read only properties, which means that the property can only be initialized once and then not changed afterwards. Again, the idea here is that since PHP 7.4, we have typed properties. A remaining problem with them is that people are not confident making public type properties because they still ensure that the type is correct, but they might not be upholding other invariants. For example, if you have some, like additional checks in your constructor, that string property is actually a non empty string property, then you might not want to make it public because then it could be modified to an empty value for example. One nowadays fairly common case is where properties are actually only initialized in the constructor and not changed afterwards any more. So I think this kind of mutable object pattern is becoming more and more popular in PHP.
Derick Rethans 1:35
You mean the immutable object?
Nikita Popov 1:37
Sorry, immutable. And read only properties address that case. So you can simply put a public read only typed property in your class, and then it can be initialized once in the constructor and you can be... You don't have to be afraid that someone outside the class is going to modify it afterwards. That's the basic premise of this RFC.
Derick Rethans 1:57
But it also means that objects of the class itself can modify that value any more, either.
Nikita Popov 2:01
Exactly. So that's, I think, a primary distinction we have to make. Genuinely, there are two ways to make this read only concept work. One is like actually read only or maybe more precisely init once, which is what this RFC proposes. We can only set that once and then even in the same class, you can't modify it again. And the alternative is the asymmetric visibility approach where you say that, okay, only in the public scope, the property can only be read, but in the private scope, you can modify it. I think the distinction there is very important, because read only property tells you that it's genuinely read only, like, if you access a property multiple times in sequence, you will always get back the same value. While the asymmetric visibility only says that the public interface is read only, but internally, it could be mutated. And that might like be, you know, intentional, just that you want to like have your state management private, but that the property is not supposed to be immutable.
Derick Rethans 3:05
How's this RFC different from read only properties, version one?
Nikita Popov 3:09
Read only properties version one was called write once properties. I think the naming is kind of one of the more important differences. The new RFC is also effectively write once, but I think it's really important to view it from an API perspective as read only because that's what the user gets to see. While write once gives you this impression, that is know that you can externally from outside the class like passing the value once I know like dependency injection, that is what they would think of when they hear write ones. And from the technical site, there is a related difference. And that difference is that new RFC only allows you to initialize read only properties inside the class scope. That means if you do something really weird, like leaving a property uninitialized in the constructor, it's not possible for someone outside the class to initialize it instead, just like an extra safety check. Of course, you can, as usual bypass that, like if you're writing a serializer or hydrator, you can use reflection to initialize it outside the class. But normally you won't be able to.
Derick Rethans 4:13
Does that mean that these read only properties can also be initialized from a normal method instead of just from the constructor?
Nikita Popov 4:19
That's true. Yes, that's possible.
Derick Rethans 4:21
So the RFC talks about that a read only property cannot be assigned from outside a class. Does that mean it can be set by a different object of the same class?
Nikita Popov 4:29
Yeah, that's how scoping works in PHP. Scoping is always class based, not object based, it's a common misconception that if you have like private scope, you can access different objects of the same class.
Derick Rethans 4:42
That was a surprise to me the first time I ran into that, but once you know, it's obvious that it's should work all over the place right. Now, the RFC states that you can only use read only with typed properties. Why is that?
Nikita Popov 4:55
So this is related to the initialisation concept, that typed introduced. Typed properties start out, if you don't give them a default value, they start out in an uninitialized state. And we reuse that state for read only properties. You can only assign to the property while it's uninitialized. And once it's initialized, you cannot assigned to it any more or even unset it back to an uninitialised state. For non typed properties, you also can get into this uninitialized state by explicitly unsetting it. But the problem is that this is not the default state. Untyped properties always have no default value, even if you don't specify one, which effectively means that these properties are always initialized. So they will be kind of useless if you used read only with them. Which is why we make this distinction to avoid any confusion. And if you want to use an untyped read only property, you do that by using the mixed type, which is the same but has the initialisation semantics of typed properties.
Derick Rethans 5:56
What would that mean if you have say, a resource or class typed property with a read only keyword? Can you not read or write to a resource any more? Or modify properties on an object, that is the value of a read only typed property?
Nikita Popov 6:12
No, you can still modify those, because we have to distinguish the concepts of like exterior and interior mutability here. So objects and resources are... Well, I mean, we often say they are passed by reference, which is not strictly true, because those are not PHP references. But the important part is that they only pass around some kind of handle and you can still modify the inside of that handle. What you can't do is you can't reassign to a different resource or reassign to a different object, it's always the same object, the same resource, but the insides of the objects, those can change. Of course, if your object also only contains read only properties, then you won't be able to change this.
Derick Rethans 6:55
Okay, and that answered another question that I have: how it possible to make the whole object read only or on all of its properties? Where the answer is by setting all the properties to read only.
Nikita Popov 7:05
We could like add a read only class modifier that makes all the properties implicitly read only but maybe future scope.
Derick Rethans 7:13
Is there a reason why read only properties can't have a default value?
Nikita Popov 7:18
So this is, again, same issue with initialization. If they have a default value, then they're already initialized. So you can't overwrite it. Like we could allow it, but you just would never be able to change them from the default value, which is something we could allow it just wouldn't be very useful.
Derick Rethans 7:36
in PHP, eight zero PHP introduced promoted or constructor promoted properties, I think it's the full name of it. How does this read only property tie in with that? Because can you set the read only flag on a constructor promoted property?
Nikita Popov 7:49
PHP Internals News: Episode 89: Partial Function Applications
Thursday, June 17th 2021, 09:17 BST
London, UK
In this episode of "PHP Internals News" I chat with Larry Garfield (Twitter) and Joe Watkins (Twitter, GitHub, Blog about the "Partial Function Applications" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 89. Today I'm talking with Larry Garfield and Joe Watkins about a partial function application RFC that they're proposing with Paul Crevela and Levi Morrison. Larry, would you please introduce yourself?
Larry Garfield 0:36
Hello World. I'm Larry Garfield or Crell on most social medias. I'm a staff engineer for Typo3 the CMS. And I've been getting more involved in internals these days, mostly as a general nudge and project manager.
Derick Rethans 0:52
And hello, Joe, would you please introduce yourself as well?
Joe Watkins 0:55
Hi, I'm Joe, or Krakjoe, I do various PHP stuff. That's all there is to say about that really.
Derick Rethans 1:02
I think you do quite a bit more than just a little bit. In any case, I think for this RFC, you, you wrote the implementation of it, whereas Larry, as he said, did some of the project management, I'm sure there's more to it than I've just paraphrased in a single sentence. But can one of you explain in one sentence, or if you must, maybe two or three, what partial function applications, or I hope for short, partials are?
Larry Garfield 1:27
Partial function application, in the broadest sense, is taking a function that has some number of parameters, and making a new function that pre fills some of those parameters. So if you have a function that takes four parameters, or four arguments, you can produce a new function that takes two arguments. And those other two you've already provided a value for in advance.
Derick Rethans 1:54
Okay, I feel we'll get into the details in a moment. But what are its main benefits of doing this? What would you use this for?
Larry Garfield 2:01
Oh, there's a couple of places that you can use partial application. It is what got me interested. It's very common in functional programming. But it's also really helpful when you want to, you have a function that like, let's say, string replace takes three arguments, two of which are instructions for what to replace, and one of which is the thing in which you want to replace. If you want to reuse that a bunch of times, you could build an object and pass in constructor values and save those and then call a function. Or you can just partially apply string replace with the things to search for, and the things to replace with and get back a function that takes one argument and will do that replacement on it. And you can then reuse that over and over again. There are a lot of cases like that, usually use in combination with functions that wants a callback. And that callback takes one argument. So array map or array filter are cases where very often you want to give it a function that takes one argument, you have a function that takes three arguments, you want to fill in those first ones first, and then pass the result that only takes one argument to array map or a filter, or whatever. So that's the one of the common use cases for it.
Derick Rethans 3:15
That's the benefits and some of its background comes from functional programming, as you've just mentioned. What is the syntax that you're proposing and some of the semantics?
Larry Garfield 3:26
The syntax that we've developed, are two placeholders that you can use in a function call. So if you're calling a function as you normally would, but for one of the arguments, you pass a question mark, or at the tail end, you have an ellipsis (dot dot dot), then that tells the engine: This is not a function call. This is a partial application. And what it will do is return not the result of the function but return a closure object that has the the arguments that correspond to those question marks. And then when called with those arguments, we'll pass those along with the original function. Probably easier to explain, if I use a concrete example, using the string replace example we talked about before, you would call it with str_replace, the example from the RFC, hello, hi, question mark. What that gives you is a callable, a closure that has one argument, which will take its type and name from str_replace. So the third argument to str_replace essentially gets copied into that closure. And what closure does internally when you call it with that one argument is it just calls string replace with hello, hi, and whatever argument you gave it and returns that value. It is conceptually very, very similar to just writing a short lambda or an arrow function that takes one arguments and calls string replace hello, hi, and that argument. In most cases, it ends up functioning almost exactly like that. There's a few subtle differences in a few places. But most of the time, you can think of it working essentially like that. The question mark means one required argument only. The dot dot dot means zero or more arguments, if you want to, say provide the first argument to a function, and then dot dot dot would mean: And then all of the other arguments, however many there are, even if it's that zero, those are what's left, which languages other languages that have partial application as a first class feature, usually end up doing it that way where you can only pre fill from the left. PHP, because the placeholder lets us do it in any order. So we can skip over arguments if we want to, which is quite nice. But it means that you can take a function and reduce it to, I want to prefill just these two arguments and leave these three arguments for the new function, or I want to prefill these arguments from the left, and then everything else, whatever it is, is left. It also lets you do cute things like if you provide all of the arguments to a function, and then just tack on a dot dot dot the end of it, then you get back a closure that takes essentially zero arguments. But when called, will call that other function. So it's lets lets you really easily build a delayed function as you need to.
Derick Rethans 6:15
When do the arguments to the function get evaluated then?
Larry Garfield 6:18
Arguments are evaluated in advance. So this is the subtle difference between partial application and the short lambda syntax. In a short lambda, what happens is, essentially, that entire expression on the right hand side gets wrapped up into a closure. And so any arguments that are compound like they have a function call that is inside one of the placeholders, or one of the arguments, that'll get evaluated later. With partial application, the function that is in a parameter position gets evaluated first and reduced to a value. And that value gets partially applied to the function. 90% of the time, that's not going to be an issue. There are a few cases where doing it one way or the other may be subtly different, but you'll spot those fairly easily.
Derick Rethans 7:02
So the RFC talks about things that you can do, but also a few things that you cannot do or don't want to do yet. What are these things that partials won't support, or run support yet, at least?
Larry Garfield 7:13
The main thing that it doesn't support is named placeholders. You can pre fill a value or an argument with a named named argument. But not a named placeholder. Those have to be positional. Named placeholders are complicated to implement, and run into a question of, if you provide those in a different order, does that also change the order of the arguments in the partially applied function that you get back in that closure? And there's a good argument to be made that either way is logical. And so we're like, no, does not deal with it, too complicated. We'll just positional only. And you cannot specify an optional arguments either. It's just again, too complicated. Things get too weird. If you have those advanced cases, use our short lambda, that works just fine. If you want to just make a new function that defers to a new function, and change its API in the process, short lambda works fine. And it's still quite short.
Derick Rethans 8:13
I know the RFC talks a little bit about references, but I don't like talking about references. So let's skip that part. In my opinion, they should be removed from the language. But I know we can't.
Larry Garfield 8:22
There's occasionally used for them. But very occasionally.
Derick Rethans 8:25
There's a bunch of technical
PHP Internals News: Episode 88: Pure Intersection Types
Thursday, June 10th 2021, 09:16 BST
London, UK
In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Pure Intersection Types" RFC that he has proposed.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 88. Today I'm talking with George Peter Banyard about pure intersection types. George, could you please introduce yourself?
George Peter Banyard 0:30
Hello, my name is George Peter Banyard. I work on PHP code development in my free time. And on the PHP Docs.
Derick Rethans 0:36
This RFC is about intersection types. What are intersection types?
George Peter Banyard 0:40
I think the easiest way to explain intersection types is to use something which we already have, which are union types. So union types tells you I want X or Y, whereas intersection types tell you that I want X and Y to be true at the same time. The easiest example I can come up with is a traversable that you want to be countable as well. So traversable and countable. Currently, you can do intersection types in very hacky ways. So you can either create a new interface which extends both traversable and countable, but then all the classes that you want to be using this fashion, you need to make them implement the interface, which might not be possible if you using a library or other things like that. The other very hacky way of doing it is using reference and typed properties. You assign two typed properties by reference, one being traversable, one being countable, and then your actual property, you type alias reference it, with both of these properties. And then my PHP will check: does the property respect type A those reference? If yes, move to the next one. It doesn't respect type B, which basically gives you intersection types.
Derick Rethans 1:44
Yeah, I saw that in the RFC. And I was wondering like, well, people actually do that?
George Peter Banyard 1:49
The only reason I know that is because of Nikita's slide.
Derick Rethans 1:51
The thing is, if it is possible, people will do it, right. And that's how that works.
George Peter Banyard 1:56
Yeah, most of the times.
Derick Rethans 1:57
The RFC isn't actually called intersection types. It's called pure intersection types. What does the word pure do here?
George Peter Banyard 2:05
So the word pure here is not very semantic. But it's more that you cannot mix union types and intersection types together. The reasons for it are mostly technical. One reason is how do you mix and match intersection types and union types? One way is to have like union types take precedence over intersection types, but some people don't like that and want to explicit it grouping all the time. So you need to do parentheses, A intersection B, close parentheses, pipe for the union, and then the other type. But I think the main reason is mostly the variance, like the variance checks for inheritance are already kind of complicated and kind of mind boggling.
Derick Rethans 2:44
I'm sure we'll get into the variance rules in a moment. What is it actually what you're proposing to add here. What is the syntax, for example?
George Peter Banyard 2:52
So the syntax is any class type with an ampersand, and any other class type gives you an intersection type, which is the usual way of doing and.
Derick Rethans 3:01
When you say class types, do you also mean interfaces?
George Peter Banyard 3:04
Yes, PHP has a concept of class types, which are mostly any class in any interface. There's also a weird exception where parent and self are considered class types, but those are not allowed.
Derick Rethans 3:20
Okay, so it's just the classes that you've defined and the class that are part of the language but not a special keywords, self and parent and static, I suppose?
George Peter Banyard 3:28
Yes, the reason for that is standard types are not allowed to be part of an intersection, because nothing can be an integer and a string at the same time. Now, there are some of the built in types, which can be kind of true. You could have a callable, which is a string, because callables can be arrays, or can be a closure. But that's like very weird and not very great. The other one is iterable. If when you expand that out, you get redundant types, which we can talk about later. And the final thing is parent, self, and static, just makes for some very weird design questions, in my opinion, like, if you ask for something to be an intersection with itself, you basically can only enforce conditions on subclasses. You have a class and you say: Oh, I want it to return self, but also be countable for some reason, but I'm not countable. So if you extend me, then you need to be countable, but I'm not. So it's very weird. parent has kind of the very same weird semantics where you can ask a parent, but it's like, if the base class doesn't support it, and you ask for a parent to be an intersection, then you basically need the child to implement the interface and then a child to return the first child. If you do that main question. Why? Because I don't see any good reasons to do it. And it just makes everything harder.
Derick Rethans 4:40
You've only added for the sake of completeness instead of it being useful. Let's move on birds. You've mentioned which types are supported, which is class names and interface names. You already hinted a little bit at redundant types. What are redundant types?
George Peter Banyard 4:56
Currently, PHP already does that with union types. If you repeat the type twice in a union, you'll get a compile error. This only affects compiled time known aliases. If you use a use statement, then PHP knows that you basically using the same type. However you use a runtime alias, then it can't detect that.
Derick Rethans 5:13
A runtime alias, what's that?
George Peter Banyard 5:15
So if you use the function class_alias.
Derick Rethans 5:16
It's new to me!
George Peter Banyard 5:18
it technically exists. It also doesn't guarantee basically that the type is minimal, because it can only see those was in its own file. For example, if you say I want A and B, but B is a child class of A, then the intersection basically resolves to only B. But you can only know that at runtime if classes are defined in different files. So the type isn't minimal. But if you do redundant types, basically, it's a easy way to check if you might be typing a bug.
Derick Rethans 5:46
You try to do your best to warn people about that. But you never know for certain.
George Peter Banyard 5:51
You never know for certain because PHP doesn't compile everything into like one big program like in check. Static analyser can help for that.
Derick Rethans 5:59
Let's talk a little bit about technical aspects, because I recommend that implementing intersection types are quite different from implementing union types. What kind of hacks that you have to make in a parser and compiler for this?
George Peter Banyard 6:11
Our parser has being very weird. The parsing syntax should be the same as union types. So I just copy pasted what Nikita did. I tried it. It worked for return types without an issue. It didn't work with argument types, because bison, which is the tool which generates our parser, was giving a shift reduce conflict, which basically tells: Oh, I got two possible states I can go in, and I don't know which branch I need to go, because the PHP parser only does one look ahead. Because it was conflicting, the ampersand, either for the intersection type or for to mark a reference. Normally, if the paster is more developed, or does more look ahead, it is not a conflict. And it shouldn't be. Ilia managed to came up with this ingenious idea, which is just redefine the ampersand token twice and have very complicated names, and just use them in different contexts. And bison just: now I have no issue. It is the same token, it is the same character. Now that you have two different tok
PHP Internals News: Episode 87: Deprecating Ticks
Thursday, June 3rd 2021, 09:15 BST
London, UK
In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Deprecating Ticks" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi I'm Derick, welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 87. Today I'm talking with Nikita Popov about a much smaller RFC this time: Deprecating Ticks. Nikita, would you please introduce yourself.
Nikita Popov 0:34
Hi Derick, I'm Nikita, and I'm working on PHP core development on behalf of JetBrains.
Derick Rethans 0:40
Let's jump straight into what this RFC is about, and that's the word ticks. What are ticks?
Nikita Popov 0:46
Ticks are a declare directive,. You write declare ticks equals one at the top of your file, and then PHP we'll call a tick function after every statement execution. Or if you write ticks equals two, then as we'll call it the function after every two statement executions.
Derick Rethans 1:05
Do you have to specify which function that calls?
Nikita Popov 1:08
Of course, so there is also a register tick function and unregister tick function and that's how you specify the function that should be called rather the functions.
Derick Rethans 1:17
How does this work, historically, because the RFC talks about the change being made in PHP seven?
Nikita Popov 1:22
Technically ticks work by introducing an opcode after every statement that calls the tick function depending on current count. The difference that was introduced in PHP seven is to what the tick declaration applies. The way PHP language semantics are supposed to work, is that declare directives are always local. The same way that strict types, only applies to a single file, ticks should also only apply to a single file. Prior to PHP seven, it didn't work out way. So if you had declare ticks, somewhere in your file, it would just enable ticks from that point forward. If you included the different file or even if the autoloader was triggered and included a different file that one would also make use of ticks. That was fixed in PHP seven, so now it is actually file local, but that also means that the ticks functionality at that point behaviour became, like, not very useful. Because usually if you want to use tics you actually want them to apply it to your whole codebase. There are ways around that. I'm afraid to say that people have approached me after this RFC and told me that they actually do that. The way around that is to register a stream wrapper. It's possible in PHP to unregister the file stream wrapper and register your own one, and then it's possible to intercept all the file includes and rewrite the file contents to include the declare ticks at the top of the file. I do use that general mechanism for real things in other places, but apparently people actually use that to like instrument, a whole application with ticks, and essentially restore the behaviour we had in PHP 5.
Derick Rethans 3:03
What was the intended use case for ticks to begin with?
Nikita Popov 3:07
Well I'm not sure what was the intended use case, but at least it was the main use case, and that's signal handling. In the PCNTL extension allows you to register a signal handler, and when the signal arrives, we can't just directly call that signal handler, because signals are only allowed to call functions without that our async signal safe. Which excludes things like memory allocation, and a lot of other things that PHP uses. What we do instead is we only set the flag that okay signal has arrived and then we have to actually run the signal handler at some later point in time. In PHP five, that worked using ticks. You declare ticks, and the PCNTL extension registered the tick handler, and then after this flag was set, it would execute your callback on the next tick. In PHP seven, an attentive mechanism was introduced, that is based on virtual machine interrupts. Those were originally introduced for time-out handling, because there we have a similar problem, that when timeout arrives, we might be in some kind of inconsistent state, like the middle of the allocator right now, and if we just bail out at that point, we are likely to see crashes down the road. So that was a significant problem in PHP five. PHP seven changed that. We now set an interrupt flag on timeout, and then the virtual machine checks this flag at certain points. The interrupt flag is not checked after every instruction, but only, like, just often enough to make sure that it's checked, at some point. So that you can't like go in an infinite loop, that ends up never checking. These points are basically function calls, and jumps that go higher up in the function, PCNTL signals can now use the same mechanism. If you call PCNTL async signals true, then those will also set the interrupt flag, and execute the signal handler on the next opportunity. The next time the interrupt flag is checked. The nice thing about that is that it's essentially free. I mean we already, we already have to do these checks for the interrupt like anyway, adding the handling for PCNTL signals doesn't add any cost on top. Unlike ticks, which have to be like executed on every instruction or at least regularly, and that does add significant cost.
Derick Rethans 5:28
Execution time itself because it's an opcode that needs to be executed.
Nikita Popov 5:32
Exactly.
Derick Rethans 5:33
So what are you proposing to do but the ticks in PHP eight one then?
Nikita Popov 5:36
I want to deprecate that. So both the declared directive itself, and the register tick function, unregister tick function.
Derick Rethans 5:44
How could users emulate the same behaviour as ticks allows them to do so now?
Nikita Popov 5:49
That's a good question. As I mentioned, if the use case is, use case of ticks was signal handling, then by using async symbols. If it was something else, then you have a problem. My assumption when writing this RFC was basically that signal handling was really the main remaining use case of ticks, because other use cases require this kind of you know stream wrapper instrumentation, and I didn't expect that people will be crazy enough to use something like that in production.
Derick Rethans 6:21
Hopefully they catch these rewritten files?
Nikita Popov 6:23
Probably yeah. I think it's possible to make this integrate with opcache. If you use it for other purposes, then, I don't think there is a really good replacement. So I think what they use it for is some kind of well instrumentation, so profiling, memory profiling, for example, and the alternative there of course is to use a tool that is appropriate for that job, for example, Xdebug contains a profiler, but of course it is not a production profiler, but I think there are also production profilers.
Derick Rethans 6:54
As far as I know all the production or APM solutions. They do this on their own without having to use sticks. They don't need any user land modifications.
Nikita Popov 7:03
Yeah, definitely. All the APM solutions support this, they use internal handlers.
Derick Rethans 7:08
Because it's actually removing functionalities that some people use, what's the reaction been to removing this functionality?
Nikita Popov 7:14
Well on the mailing list at least positive, but as I mentioned at least some people have like pointed out on the pull request that they are using the functionality.
Derick Rethans 7:23
Enough in such a way to sway for not deprecating them? What is the benefits of getting rid of ticks, if you don't use them?
Nikita Popov 7:31
That's, I think the thing, that there is not really a big benefit to getting rid of them. Like they don't add a lot of technical complexity to the engine. They're pretty simple in that sense. I haven't seen those responses. I'm kind of rolling a bit unsure if we should really remove them, because you could argue that well they don't really hurt anyone. I do have to say that I think all the things that people use sticks for, all the cases I have heard about, and all of those cases ticks are not the right way to solve the problem. They are not t
PHP Internals News: Episode 86: Property Accessors
Thursday, May 27th 2021, 09:14 BST
London, UK
In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Property Accessors" RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is episode 86. Today I'm talking with Nikita Popov about his massive property excesses RFC. Nikita, would you please introduce yourself?
Nikita Popov 0:32
Hi Derick, I'm Nikita, and I do work on PHP core development, on behalf of JetBrains.
Derick Rethans 0:39
This is probably the largest RFC I've seen in a while. What in one sentence, are you proposing to add to PHP here?
Nikita Popov 0:46
I would say it's an alternative to magic get and set, just for one specific property instead of all of them. That's the technical side. Maybe I should say something about the like motivation behind it, which is that since PHP seven four, we have type properties, that at least for me personally with that feature, the need to have this typical pattern of private property for storage, plus a public getter and setter methods, the main motivation for that has kind of gone away, because we can now use types to enforce any contracts on value. And now these getter and setter methods most if you like boilerplate. So the idea with accessors, at least my idea with accessors is that you really shouldn't use them. You should just have them as a backup option. You declare a public property in your class, and then maybe later, years later, it turns out that okay, that property actually requires additional validation. And right now if you have a public property, then you don't really have a good way of introducing that. Only way is to either break the API contract by converting the property into getter/setter methods where you can introduce arbitrary code, or by using magic get/set, which is definitely possible and persist the API contract, it's just fairly ugly.
Derick Rethans 2:09
You changes the public property that people could read into a private one. And because it's private, the set and get metric methods are being called.
Nikita Popov 2:18
Exactly.
Derick Rethans 2:19
This RFC is titled Property accesses, how do these improve on the situation?
Nikita Popov 2:24
So I think there are really two fairly orthogonal parts to this RFC. The first part is implicit accesses that don't have any custom behaviour, and just allow controlling the behaviour of properties a bit more precisely. In particular, the most important part is probably the asymmetric visibility, where you have a property that's publicly readable, but can only be set from within the class. So public read/ private write. I think that's a, maybe the most common requirement. The second part is where you can actually introduce some custom behaviour. So where you can say that okay, the get behaviour for this property looks like this, and the set behaviour, it looks like this. Which is essentially exactly the same as what magic get/set does, just for a single property.
Derick Rethans 3:10
For example, when you then do set, or you can add additional validation to it.
Nikita Popov 3:14
Exactly. Originally, you had a simple public property, then you can add a setter that checks okay this string cannot be empty.
Derick Rethans 3:23
Okay, what it's the syntax that you're proposing?
Nikita Popov 3:26
I went with these essentially the same syntax that's being used in C#. Looks like you write public foobar, and then you have this sort of semi colon you have a code block. And this code block contains two accessors, so then you have something like get, and another code block that specifies the get behaviour, and set, and the code block that specifies the set behaviour and so on.
Derick Rethans 3:52
The RFC talks about implicit and explicit implementations of these getter and setter accessors. What is the difference between them and how does it look different in syntax?
Nikita Popov 4:03
Yeah so the difference is, either you can write just get semi colon, set semicolon, that's an implicit implementation, or you actually specify a code block with real custom behaviour. To do the implicit implementation, you're saying that this is really a normal property, and PHP automatically manages the storage for you, is that you have this more fine grained control over how it works. Namely what you can do is you can say that you have get and private set. But that's a property that's publicly read only and internally writeable. You can write just get without set, in which case it's a real read only property both publicly and privately, or to be more precise, it's an init once property so you can assign to it once.
Derick Rethans 4:52
How do you keep track of the init once?
Nikita Popov 4:53
It's same mechanism as for Type Properties, where we distinguish between an initialized and an uninitialized property. You can assign to an uninitialized property, but you can't assign to an initialized one, if it's read only. The only maybe problem there is that this mechanism, requires that the property actually is uninitialized to start with, which means that for accessors you don't have any default values. To say there is no implicit default value, no implicit null value. If you want to have a default value the same as with type properties you have to specify it explicitly. Specifying a default value really only makes sense if the property is both readable and writable. For Read Only properties, if you specify the default then you will you can change that.
Derick Rethans 5:37
You have basically have created a constant.
Nikita Popov 5:39
Yes, it is essentially a constant.
Derick Rethans 5:41
You mentioned already, PHP seven four introduced type properties. How do these types interact with the setter and getter accessors?
Nikita Popov 5:50
I would say in the obvious way. The getter is required to return type of property, modulo the usual weak typing conversions, and the setter also checks before it's called whether the passed value matches the type or not. But enforces that matches the type.
Derick Rethans 6:08
This does mean that if you provide an explicit implementation for the set accessor, you also need to specify the parameter name?
Nikita Popov 6:15
No, or you can specify the parameter name, and if you don't then that's just passed in as the value variable. It's also inspired by how C# and Swift do it. I mean there are some possible variations here we could always require an explicit name, some people for that, or I also heard that some people would like to have the name of this implicit variable match the name of the property, instead of always being just value.
Derick Rethans 6:41
Would you have to specify the type though?
Nikita Popov 6:43
You wouldn't have to and you're actually not allowed to. So the accessor implementation is somewhat strict about not allowing you to do anything that would be redundant because otherwise, you know, there are quite a lot of extra things you could be adding everywhere.
Derick Rethans 6:56
That's the same way as marking a property as private. And then the accessors as private as well. Right?
Nikita Popov 7:03
Yeah exactly. So, then that will also say: if the property is already private you can't, again say that the accessors also private.
Derick Rethans 7:11
I think that's the wise thing, otherwise people go overboard with adding private and final and whatever everywhere anyway right.
Nikita Popov 7:18
One could argue that it's really not our business and this is a coding style question, but you know it's better to not leave people, with the option of doing stupid things.
Derick Rethans 7:28
I saw in the RFC that it is also possible to use references with the get accessor. Does this complicated implementation and the idea of this RFC
PHP Internals News: Episode 85: Add IntlDatePatternGenerator
Thursday, May 20th 2021, 09:13 BST
London, UK
In this episode of "PHP Internals News" I discuss the Add IntlDatePatternGenerator RFC with Mel Dafert (GitHub).
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi I'm Derick, welcome to PHP internals news, the podcast, dedicated to explain the latest developments in the PHP language. This is episode 85. Today I'm talking with Mel Dafert about the "Add Intl Date Pattern Generator RFC" that she's proposing for inclusion into PHP 8.1. Mel would you please introduce yourself?
Mel Dafert 0:35
Hello, I am Mel. I've been working professionally with PHP for about three years. Recently I started reading the internals mailing list in my free time, but this is my first time contributing.
Derick Rethans 0:46
What made you think starting to read the PHP internals mailing list?
Mel Dafert 0:50
I generally like reading mailing lists and issue trackers. And since I work with PHP, it was interesting to read what's, what's happening.
Derick Rethans 1:02
That's what I'm trying to read this podcast as well of course; explaining what happens in the PHP development. But let's get to your RFC. What is the problem that you're trying to solve for this?
Mel Dafert 1:14
Currently, PHP exposes the ability for locale dependent date formatting with the Intl Date Formatter class. It is basically only three options for the format: long, medium and short. These options are not flexible enough in some cases, however. For example, the most common German format is day dot numerical month, dot long version of the year. However, neither the medium nor the short version provide this, and they use either the long version of the month, or a short version of the year, neither of which were acceptable in my situation.
Derick Rethans 1:47
I realize that you basically ran into a problem that PHP wasn't doing something you wanted to do it. But what made you actually wanting to contribute this?
Mel Dafert 1:57
I ran into this exact problem at work where I wanted to format dates in this specific way. After some research, I found out that ICU, the library that powers Intl Date Formatter, exposes exactly this functionality already. It would be relatively easy to wire this up into PHP and expose it there as well. I also found in a bug report that other people had this problem as well, so I decided to try my best at hacking at the PHP source and make it available to everyone, using PHP.
Derick Rethans 2:25
Had you ever seen a PHP source code before?
Mel Dafert 2:28
I don't think so. No.
Derick Rethans 2:29
But you are familiar with C a little bit?
Mel Dafert 2:32
On a very basic level, yes.
Derick Rethans 2:34
As part of this RFC What are you trying to suggest to add to PHP?
Mel Dafert 2:39
ICU exposes a class called date time pattern generator, which you can pass a locale and so called skeleton and it generates the correct formatting pattern for you. Skeleton just includes which part are supposed to include it, to be included in the pattern, for example the numerical date, numerical month, and the long year, and this will generate exactly the pattern I wanted earlier. It is also a lot more flexible, for example the skeleton can also just consist of the month and the year, which was also not possible so far. I am proposing to add a Intl Date Pattern Generator class to PHP, which can be constructed for locale, and exposes the get best pattern method that generates a pattern from a skeleton for that locale.
Derick Rethans 3:22
The skeletons, what do you specify in these skeletons?
Mel Dafert 3:27
It's a similar format to the pattern itself. For example, it's lowercase y lowercase y uppercase M uppercase M, would give you only the year and only the month, if I'm correct, that's exactly what the skeleton looks like.
Derick Rethans 3:43
But it puts it in the right order?
Mel Dafert 3:45
It puts it in in the right order, and in some cases also adds extra characters, or even changes the format slightly, depending on the locale.
Derick Rethans 3:55
So it is a bit of a flexible way to tell the Intl extension to format them in a slightly more, well how do you say this, a slightly more intelligent way than what the standard, long, short and medium constants do for you.
Mel Dafert 4:11
Exactly.
Derick Rethans 4:12
Why is it so important that you get these formats, right, or rather I should say, how do these locales influence formats and why is this important?
Mel Dafert 4:21
There are conventions of how to format dates and times vary rather strongly between languages and country. In Austria, for example, nobody would expect to understand the US format of month slash day last year. I assume people in England may have the same issue.
Derick Rethans 4:38
I think everybody has that issue except for people in the US.
Mel Dafert 4:42
But that only shows the importance of using a format that people are used to and understand. Other languages like mainland Chinese even have the words for day and month included in the format, as far as I understand. I don't speak Chinese.
Derick Rethans 4:59
Neither do I, but a long time ago when I, when I added the date time support, not Intl, but PHP standard date time support, I also looked at locales that operating systems have. And even these locales, which is not something that Intl uses now, also encode these extra characters at least for Japanese, so that was interesting to see there as well.
Mel Dafert 5:22
There is a lot of sometimes somewhat unexpected formats.
Derick Rethans 5:27
And I think German sometimes once the add the in front, and sometimes behind and things like that. I know there's lots of little intricacies, yes. I see that he RFC makes an argument about which name to pick for the new class. Can you elaborate on the two different options that are?
Mel Dafert 5:44
Yes, this is certainly for us and what I would call bike shedding. ICU has something of an inconsistency in its naming. The formatting class is called date formatter. And the pattern generator class is called Date Time pattern generator.
Derick Rethans 6:00
So it has the extra word time in it?
Mel Dafert 6:03
Between some inconsistency with Intl Date Formatter, which already exists in PHP, and the Intl Date Time pattern generator, or if we make sure PHP is internally consistent and omit the time in all cases. So far consensus seems to lean towards the second option. This is also what the Hack people decided to use.
Derick Rethans 6:24
And I believe that's the one you are wanting to go with in this RFCs as well, right?
Mel Dafert 6:28
Exactly. So far, everybody voted slide, or like express themselves to slightly favour the version without time. So that's the one I'm going with.
Derick Rethans 6:40
Of course, as you mentioned, this is a fairly small change to it, but the RFC talks a bit about things to add in the future, because I believe you weren't suggesting to add all of these Intl functionality straightaway. What is this future scope?
Mel Dafert 6:55
ICU would also expose more methods around the skeletons, for example, turning a pattern back into its skeleton, or building a list of skeleton and then mapping to the patterns from scratch. That's what you would do in theory if you added your own special locale to this.
Derick Rethans 7:17
I'm not sure how to do that with PHP actually, but I think ICU allows you to build your own basically files with settings right?
Mel Dafert 7:25
Exactly. Thi
PHP Internals News: Episode 84: Introducing the PHP 8.1 Release Managers
Thursday, May 13th 2021, 09:12 BST
London, UK
In this episode of "PHP Internals News" I converse with Ben Ramsey (Website, Twitter, GitHub) and Patrick Allaert (GitHub, Twitter, StackOverflow, LinkedIn) about their new role as PHP 8.1 Release Managers, together with Joe Watkins.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
Derick Rethans 0:14
Hi, I'm Derick, welcome to PHP internals news, a podcast, dedicated to explaining the latest developments in the PHP language. This is episode 84. Today I'm talking with the recently elected PHP 8.1 RMs, Ben Ramsey and Patrick Allaert. Ben, would you please introduce yourself.
Ben Ramsey 0:34
Thanks Derick for having me on the show. Hi everyone, as Derick said I'm Ben Ramsey, you might know me from the Ramsey UUID composer package. I've been programming in PHP for about 20 years, and active in the PHP community for almost as long. I started out blogging, then writing for magazines and books, then speaking at conferences, and then contributing to open source projects. I've also organized a couple of PHP user groups over the years, and I've contributed to PHP source and Docs and a few small ways over the years, but my first contributions to the project were actually to the PHP GTK project.
Derick Rethans 1:14
Oh, that's a blast from the past. You know what, I actually still run daily a PHP GTK application.
Ben Ramsey 1:21
Oh, that's interesting. What does it do?
Derick Rethans 1:23
It's Twitter client.
Ben Ramsey 1:24
Did you write it.
Derick Rethans 1:26
I did write it. Basically I use it to have a local copy of all my tweets and everything that I've received as well, which can be really handy sometimes to figuring out, because I can easily search over it with SQL it's kind of handy to do.
Ben Ramsey 1:41
It's really cool.
Derick Rethans 1:42
Yep, it's, it's still runs PHP 5.2 maybe, I don't know, five three because it's haven't really been updated since then.
Ben Ramsey 1:49
Every now and then there will be some effort to try to revive it and get it updated for PHP seven and eight, but I don't know where that goes.
Derick Rethans 1:59
I don't know where that's gone either. In this case, for PHP eight home there are three RM, there's Joe Watkins who has done it before, Ben, you've just introduced yourself, but we also have Patrick Allaert, Patrick, could you also please introduce yourself.
Patrick Allaert 2:13
Hi Derick, thank you for the invitation for the podcast, my name is Patrick Allaert. I am a Belgian freelancer, living in Brussels, and I spent half of my professional time as a IT architect and/or a PHP developer, and the other half, I am maintaining the PHP extension of Blackfire, a performance monitoring solution, initiated by Fabien Potencier.
Derick Rethans 2:39
I didn't actually know you were working on that.
Patrick Allaert 2:40
I'm not talking much about it but more and more. So I succeeded to Julian Pauli, who by the way was also released manager before so now I'm working with Blackfire people. It's really great, and this gives me the opportunity to spend about the same amount of time developing in C and in PHP. This is really great because at least I don't. It's not just only doing C. I, at least I connect with what you can do with PHP. I see the evolution from both sides. And this is really great. It's great, it's also thanks to you Derick, you granted me access to PHP source codes. That was to contribute to testfest something like 12/13 years ago, it was, CVS, at that time.
Derick Rethans 3:28
CVS, so now I remember that. Basically, what you both of you're doing is making me feel really old and I'm not sure what I like that or not. I think we all have gotten less head on our heads and greyer in our beards. In any case, what made you volunteer for being the PHP 8.1 RM?
Patrick Allaert 3:46
In my case, I think there were two two reasons is that PHP really brings a lot to me in my career, everything is built around my expertise in PHP and its ecosystem. By volunteering as a release manager. I think I can give something back to PHP, because the last time I contributed to source code of PHP, it was really years ago. If I remember it was array to string conversion that was very silent and not emitting any notice; now it's warning. In the meantime, so I think that was PHP 5.0,
Derick Rethans 4:22
Ages ago.
Patrick Allaert 4:23
Ages ago. Indeed. I was quite passive I was mostly reading on PHP internals, and most of the time now that is quite big so if, if I had to say something I could always see some someone who already just said the same thing so I was not saying: plus one. This is one of the reason and the second one I think is that I think it's kind of a unique opportunity, and I can learn a couple of things. I think, on day one when the Rasmus gave me the access, saying that I can do to OAuth authentication on SSH and that was: okay, day one I already learned something, so that was really cool.
Derick Rethans 4:58
And you Ben, I think you tried to be the PHP eight zero release manager as well at some point. That didn't happen at the time, but you've tried again.
Ben Ramsey 5:06
I almost didn't try again. I don't know why but when Sara announced it this year, I thought about it, and I don't know, I tossed it around a little bit, but I've been wanting to do it for a long time and I've noticed as Joe Watkins recently put it on a blog post that we need to help the internals avoid buses. So since this is a programming language that I've spent a lot of time with just as Patrick mentioned, both in and out of my day jobs. I want it to stick around to thrive. Since I'm not a C guru, but I do have a lot of experience managing open source software. I wanted to volunteer as a release manager, and I hope that I can use this as an opportunity to inspire others who might want to get involved, but don't know how.
Derick Rethans 5:55
And of course you just mentioned Joe, Joe Watkins, who is the third PHP release manager for 8.1, and that is a bit of a new thing because in the past, when the past many releases I can remember you've only had two most of the time.
Ben Ramsey 6:09
I think, on the mailing list that came up early on in the thread, and there was a general consensus, I think, consensus may be the wrong word, but there were a couple of people who spoke up and said that they wouldn't mind seeing multiple rookies or mentees or whatever you want to call us, and Joe when he volunteered to be the veteran, and he was the only one who volunteered as the veteran. He said that he would take on two. And so that's that's why Patrick and I are both here and I think that's a good idea, because it will continue to help, you know, us to avoid buses.
Derick Rethans 6:46
Yep. And if you're three, you only have once every 12 weeks. Whereas of course, in my case doing it for PHP 7.4 it's every four weeks, because it's me on my own, isn't it. Which is unfortunate that these things happen because people get busy in life sometimes. Getting started being a PHP release manager can be a bit tricky sometimes because just before we started recording, I had to add you to a few mailing lists. Do you think you've now have access to everything, or what do you need access to to begin with?
Patrick Allaert 7:18
There is the documentation about release managers, what are you supposed to do, and, and there is an effort of documentation, what you have to ask, in terms of access, and that's great. We are probably going to contribute with our findings to, to improve the documentation. Once you did a bit of the setup, mainly needs to access the servers. You should also know what is the workflow and what are the usual tasks. This is mentioned in the documentation, but I think it would be better to have a live discussion with someone that already did it. The fact that we are doing it with Joe Watkins, who is not only a release manager of 8.1, but also previous release manager, that should be really smooth, to, to see what the the orders and what is the routine to do. To do so, why do you think Ben?
Ben Ramsey 8:16
I agree. I think that, I mea



