Pages

Wednesday, March 26, 2014

Nullable

able of Contents 

  • Why Nullable?
  • Why Not Always Use Nullable?
  • Case Study
  • Examples
  • Various Null Handling Scenarios
Hack introduces a safer way to deal with nulls through a concept known as the "Nullable" type (sometimes called "maybe" or "option"). Nullable allows any type to have null assigned and checked on it. Nullable is very useful for primitive types that don't generally allow null as one of their values, such as bool and int, but it can be also be useful for user-defined classes. Below is an example. Note the ? operand used to represent nullable.
<?hhfunction check_not_null(?int $x): int {
  if (
$x === null) {
    return -
1;
  } else {
    return 
$x;
  }
}
check_not_null() takes a nullable int as a parameter. This int can now be checked for null within the code, and appropriate action can be taken.
Nullables are useful when a possible invalid state for a primitive (value) type is required. A bool could be a common scenario where true or false has possibly not been determined yet. Or when null may be returned from a database that maps a column to a primitive type.

Why Nullable? 

It is true that in PHP null can currently be assigned to any variable. For example:
<?phpclass Foo {
  protected 
$x null;
}
$x is assigned to null. And, of course, checks can be made programmatically to see if $x is null:
<?phpclass Foo {
  protected 
$x null;
 
  public function 
bar() {
    if (
$this->=== null) {
      echo 
"Null 1";
    }
 
    if (
is_null($this->x)) {
      echo 
"Null 2";
    }
  }
}
So what exactly is the Hack nullable feature trying to solve? In a statically-typed language such as C#, null cannot be assigned to a primitive like an int. One of Hack's primary purposes is to type check annotated code, bringing a statically-typed feel to the dynamic language that is PHP. Taking the above example and "hackifying" it, the Hack type checker will complain, as it should since null is not in the range of values for int.
<?hh
 
class Foo {
  
int $x null;
 
  public function 
bar() {
    if (
$this->=== null) {
      echo 
"Null 1";
    }
 
    if (
is_null($this->x)) {
      echo 
"Null 2";
    }
  }
}
The above example will output:
File "NullableDoc.php", line 5, characters 13-15:
Wrong type hint
File "NullableDoc.php", line 5, characters 13-15:
This is an int
File "NullableDoc.php", line 5, characters 22-25:
It is incompatible with a nullable type
Nullables are useful when a possible invalid state for a primitive (value) type is required. For example, a bool could be a common scenario where true or false has possibly not been determined yet. Here is an example:
<?hh
 
class NullableBool {
  public function 
boolMayBeNull(?bool $b): string {
    if (
$b === null) {
      return 
"Null";
    } else if (
$b === true) {
      return 
"True";
    } else {
      return 
"False";
    }
  }
}
 
function 
main_nt() {
  
$nb = new NullableBool();
  
$x $nb->boolMayBeNull(true);
  
$y $nb->boolMayBeNull(false);
  
$z $nb->boolMayBeNull(null);
  
var_dump($x);
  
var_dump($y);
  
var_dump($z);
}
 
main_nt();
The above example will output:
string(4) "True"
string(5) "False"
string(4) "Null"
A real-world scenario when a primitive type may not have a defined state is in a database. For example, if there is a database column that is defined as an int, there could be times when values in that column may not have been set yet. Nullable allows code to handle those cases.

Why Not Always Use Nullable? 

It may seem like using nullable should just be the default for primitives, just in case there is some sort of null state. However, many times it is reasonable to guarantee that a variable will be initialized, and, if somehow the variable isnull, a serious problem has occurred. For example:
<?hh
 
class NullableNotAppropriate {
  public function 
noNullableNeeded(): ?int {
    
$vec Vector{3};
    return 
$vec->count();
  }
}
 
function 
main_nt() {
  
$nna = new NullableNotAppropriate();
  
$y $nna->noNullableNeeded();
  
var_dump($y);
}
 
main_nt();
There is certainly nothing technically wrong with the above code. ?int can be used as the return type annotation and Hack (nor HHVM) will not complain. However, the ?int is too relaxed. While this example is contrived, there is no way that $vec->count() will ever be null. And if it ever is, then something is wrong with the Vector implementation or the runtime.

Case Study 

Do not use nullable solely for convenience or to just get rid of those "annoying" Hack errors. Be judicious, particularly in core code. Here is a real world case study:
I've seen several core functions that have been typed to accept and return nullable types that handle null solely for convenience (array_invert was the most recent). This effectively taints all values going through these functions as nullable even though the caller may have had a stronger guarantee.
For example, if I take care to have a non-null array and pass it to array_invert (whose signature isfunction array_invert(?array $a): ?array), Hack now has to deal with nullability of the return value. In practice, this leads to nullthrows/invariant calls in the callers of array_invert, spreading the $&*% everywhere as we've come to affectionally call it. In this particular case it also places extra burden on callers that are working with (statically) non-null values and leveraging Hack's powerful analysis.
Sometimes it is appropriate to allow null. If a healthy majority of the callers expect to pass in and receive back nullable values, that is a good case to make for allowing null. That's where the judgement call comes in. Otherwise, pruning out nullability helps us write more cohesive functions and reduces the state space complexity of our programs.

Examples 

The following show some examples of using nullables, including showing what happens when null is passed to or returned from a method where null is not an expected value.

Returning a Nullable 

Here is a quick example that type checks correctly in Hack and produces valid output with HHVM:
<?hh
 
class NullableTest {
  public function 
mayReturnNull(int $x): ?int {
    if (
$x 5) {
      return 
5;
    } else {
      return 
null;
    }
  }
}
 
function 
main_nt() {
  
$nt = new NullableTest();
  
$y $nt->mayReturnNull(10);
  
var_dump($y);
  
$y $nt->mayReturnNull(4);
  
var_dump($y);
}
 
main_nt();
The above example will output:
int(5)
NULL
Without nullable, the Hack type checker would balk. For example, if ? was left out of the annotated return type, Hack would give an invalid return type error. Here is the incorrect code and the Hack error output:
<?hhclass NullableTest {
  public function 
mayReturnNull(int $x): int {
    if (
$x 5) {
      return 
5;
    } else {
      return 
null;
    }
  }
}
 
function 
main_nt() {
  
$nt = new NullableTest();
  
$y $nt->mayReturnNull(10);
  
var_dump($y);
  
$y $nt->mayReturnNull(4);
  
var_dump($y);
}
 
main_nt();
The above example will output:
File "NullableTest.php", line 9, characters 14-17:
Invalid return type
File "NullableTest.php", line 4, characters 42-44:
This is an int
File "NullableTest.php", line 9, characters 14-17:
It is incompatible with a nullable type
However, the above example will still run correctly since HHVM drops the return type annotations at runtime. So, the runtime output would be the same as the example that correctly uses nullable.
The above example will output:
int(5)
NULL

Nullable and XHP 

Nullable works with XHP elements as well.
<?hhclass NullableTest {
  public function 
mayReturnNull(int $x): ?:xhp {
    if (
$x 5) {
      return <
divHello World </div>;
    } else {
      return 
null;
    }
  }
}
 
function 
main_nt() {
  
$nt = new NullableTest();
  
$y $nt->mayReturnNull(10);
  
var_dump($y);
  
$y $nt->mayReturnNull(4);
  
var_dump($y);
}
 
main_nt();

Nullable Method Parameter 

Here is the example above enhanced with a new function that takes a nullable as a parameter:
<?hhclass NullableTest {
  public function 
mayReturnNull(int $x): ?int {
    if (
$x 5) {
      return 
5;
    } else {
      return 
null;
    }
  }
 
  public function 
nullableParameter(?int $x): int {
    if (
is_null($x)) {
      return 
100;
    } else {
      return -
1;
    }
  }
}
 
function 
main_nt() {
  
$nt = new NullableTest();
  
$y $nt->mayReturnNull(10);
  
var_dump($y);
  
$y $nt->mayReturnNull(4);
  
var_dump($y);
 
  
$z $nt->nullableParameter(10);
  
var_dump($z);
  
$z $nt->nullableParameter(null);
  
var_dump($z);
}
 
main_nt();
The above example will output:
int(5)
NULL
int(-1)
int(100)
Now, if the nullable is taken away from the parameter in nullableParameter(), Hack will balk.
<?hhclass NullableTest {
  public function 
mayReturnNull(int $x): ?int {
    if (
$x 5) {
      return 
5;
    } else {
      return 
null;
    }
  }
 
  public function 
nullableParameter(int $x): int {
    if (
is_null($x)) {
      return 
100;
    } else {
      return -
1;
    }
  }
}
 
function 
main_nt() {
  
$nt = new NullableTest();
  
$y $nt->mayReturnNull(10);
  
var_dump($y);
  
$y $nt->mayReturnNull(4);
  
var_dump($y);
 
  
$z $nt->nullableParameter(10);
  
var_dump($z);
  
$z $nt->nullableParameter(null);
  
var_dump($z);
}
 
main_nt();
The above example will output:
File "NullableTest.php", line 32, characters 31-34:
Invalid argument
File "NullableTest.php", line 13, characters 37-39:
This is an int
File "NullableTest.php", line 32, characters 31-34:
It is incompatible with a nullable type
In fact, HHVM will also balk when trying to run the above code without nullable in the parameter ofnullableParameter().
The above example will output:
int(5)
NULL
int(-1)
HipHop Fatal error: Argument 1 passed to NullableTest::nullableParameter() must be an instance of int, null given in NullableTest.php on line 20
The reason that HHVM throws an error is due to the scalar type annotating capability on function parameters that is built into the runtime. An interesting "fix" to this particular issue is to provide null as the default value to the parameter $x. However, doing this will very appropriately cause Hack to balk.
<?hhclass NullableTest {
  public function 
mayReturnNull(int $x): ?int {
    if (
$x 5) {
      return 
5;
    }
    else {
      return 
null;
    }
  }
 
  public function 
nullableParameter(int $x null): int {
    if (
is_null($x)) {
      return 
100;
    }
    else {
      return -
1;
    }
  }
}
 
function 
main_nt() {
  
$nt = new NullableTest();
  
$y $nt->mayReturnNull(10);
  
var_dump($y);
  
$y $nt->mayReturnNull(4);
  
var_dump($y);
 
  
$z $nt->nullableParameter(10);
  
var_dump($z);
  
$z $nt->nullableParameter(null);
  
var_dump($z);
}
 
main_nt();
The above example will output:
File "NullableTest.php", line 13, characters 37-39:
Please add a ?, this argument can be null
However, running this code will actually work just fine with HHVM.
The above example will output:
int(5)
NULL
int(-1)
int(100)
This "fix" should be used with extreme caution since it is not type correct to assign null to a variable that is expecting an int. The PHP language allows this; that does not mean it is the right thing to do.

User-defined Types 

It is worth noting that nullable can be used on user-defined types. Take this example:
<?hh
 
class UserDefinedType {}

class 
NullableUserDefinedType {
  public function 
userDefinedTypeMayBeNull(?UserDefinedType $udt): string {
    if (
$udt === null) {
      return 
"Null";
    } else {
      return 
"I'm Set";
    }
  }
}
 
function 
main_nt() {
  
$nb = new NullableUserDefinedType();
  
$x $nb->userDefinedTypeMayBeNull(null);
  
$y $nb->userDefinedTypeMayBeNull(new UserDefinedType());
  
var_dump($x);
  
var_dump($y);
}
 
main_nt();
The above example will output:
string(4) "Null"
string(7) "I'm Set"
There are no complaints by Hack nor HHVM with the above code. However, remove the ? from theUserDefinedType $udt method parameter, HHVM will then balk as it did in the nullable parameter example above.

Real world code 

There was an example at a company where nullable came into play (there are probably many examples, but this is a specific one that can demonstrate the use of nullable quite well). The original code looked like this:
Caller
<?hh
function updateA($key,
                 
$value,
                 
$reviewer_unixname null,
                 
$comments null,
                 
$value_check null,
                 
$set_reviewer_required false,
                 
$diff null,
                 
$diff_id null,
                 
$author_unixname null) {
 
// Assume, after a bunch of code above this statement, that $comments is still nullcreate_updateA($key$comments);
Callee
<?hh
function create_updateA(string $keystring $comments) {
  
$comments = ($comments === '')
    ? 
'Updated without comments.'
    
'Updated with the following comments: ' $comments;
 
  
$vc VC();
  
$ua get_update_array($key$comments);
 
  
regsiter(array('A''B'), $vc$ua'update');
}
Hack complained about this code with the following message (line numbers may not be accurate):
The above example will output:
File "RealWorldNullable.php", line 13, characters 43-51:
Invalid argument
File "RealWorldNullable.php", line 23, characters 57-62:
This is a string
File "RealWorldNullable.php", line 7, characters 38-41:
It is incompatible with a nullable type
And, in fact, HHVM would complain about this code as well, since passing null to a function with a string type annotated parameter will cause an exception.
The above example will output:
HipHop Fatal error: Argument 2 passed to create_updateA() must be an instance of string, null given in RealWorldNullable.php on line 37
The $comments parameter in create_updateA() has been type annotated with a string. However, the callerfunction updateA() had set the string to be passed to create_updateA() as null. There were two ways to solve this problem. The good way and the bad way. The "good" way to solve this problem (and the one that was actually implemented and accepted at this company) is to use nullable.
<?hh
function create_updateA(string $key, ?string $comments) {
  
$comments = ($comments === '')
    ? 
'Updated without comments.'
    
'Updated with the following comments: ' $comments;
 
  
$vc VC();
  
$ua get_update_array($key$comments);
 
  
regsiter(array('A''B'), $vc$ua'update');
}
The above solution solves both the Hack and HHVM complaints that was seen without nullable. The "bad" way would solve the HHVM problem, but not the Hack problem. Basically, provide $comments in create_updateA() a default value of null.
<?hh
function create_updateA(string $keystring $comments null) {
  
$comments = ($comments === '')
    ? 
'Updated without comments.'
    
'Updated with the following comments: ' $comments;
 
  
$vc VC();
  
$ua get_update_array($key$comments);
 
  
regsiter(array('A''B'), $vc$ua'update');
}
Again, as shown in the example above on nullable method parameters, using a default value of null in this case is not type correct and should not be done under normal circumstances.

Various Null Handling Scenarios 

This section will discuss various use-cases and scenarios that one may come across using nullables.

Null Member Variables 

These can be common Hack typing errors, and actually quite frustrating.
The above example will output:
File "null_member_variable.php", line 15, characters 17-19:
You are trying to access the member get but this object can be null.
File "null_member_variable.php", line 11, characters 11-18:
This is what makes me believe it can be null
The above example will output:
File "null_member_variable.php", line 16, characters 14-21:
Invalid return type
File "null_member_variable.php", line 12, characters 26-28:
This is an int
File "null_member_variable.php", line 11, characters 11-18:
It is incompatible with a nullable type
File "null_member_variable.php", line 15, characters 7-21:
All the local information about the member x has been invalidated during this call.
This is a limitation of the type-checker, use a local if that's the problem.
Take this example:
<?hh
 
class BizBang {
  public function 
get(): int {
    return 
time();
  }
}
 
class 
NullMemVar {
  private ?
BizBang $x;
  public function 
foo(): BizBang {
    if (
$this->!== null) {
      
$a $this->x->get();
      
$b $this->x->get();
      if (
$b $a 1) {
        return 
$this->x;
      }
    }
    return new 
BizBang();
  }
}
 
function 
main_nmv() {
  
$c = new NullMemVar();
  
var_dump($c->foo());
}
 
main_nmv();
When the Hack type checker is run on the above code, the first error above is thrown. Wait?!? The if statement infoo() checks if $this->x is null. Why does Hack think this member variable might be null now on the call toget()? Even though there was an explicit check to see if $this->x is null, the Hack type checker cannot provably guarantee that the first call to $this->x->get() did not reset $this->x back to null. Thus the second call to$this->x->get() could, theoretically, be done with $this->x being null. Even if practically the function call will not set the member variable to null, the Hack type checker is very restrictive in this case. To get around this "limitation" in the type checker with respect to nullable types, use one or a combination of:
  • a local variable
  • invariant()
  • Some library function that throws if its input is null and returns the non-nullable output, such as nullthrows()defined below
Here is how to write the above code to make sure the Hack type checker is happy:
<?hh
 
class BizBang {
  public function 
get(): int {
    return 
time();
  }
}
 
class 
NullMemVar {
  private ?
BizBang $x;
  public function 
foo(): BizBang {
    
$local_for_x $this->x;
    if (
$local_for_x !== null) {
      
$a $local_for_x->get();
      
$b $local_for_x->get();
      if (
$b $a 1) {
        return 
$local_for_x;
      }
    }
    return new 
BizBang();
  }
}
 
function 
main_nmv() {
  
$c = new NullMemVar();
  
var_dump($c->foo());
}
 
main_nmv();
Notice how a local variable to foo() called $local_for_x is assigned to the class member variable. Now, it is provable that $local_for_x will not be set to null by get() since get() can have no effect on $local_for_x (i.e.,$local_for_x is never on the RHS of an assignment statement).

Nullable and Non-Nullable Interaction 

Sometimes a nullable type needs to be passed to a method that takes the non-nullable variant of that same type. For example, assume the following piece of code and the output from the Hack type checker:
<?hh
 
class NullableNullThrows {
  public function 
foo(int $y): ?int {
    if (
$y 4) {
      return 
$y 5;
    } else {
      return 
null;
    }
  }
}
 
class 
NullableNullThrowsTest {
  protected 
int $x;
 
  public function 
__construct() {
    
$nnt = new NullableNullThrows();
    
$nullable_value $nnt->foo(2);
    
$this->$nullable_value;
  }
}
 
function 
main_nnt() {
  
$nntt = new NullableNullThrowsTest();
}
 
main_nnt();
The above example will output:
File "NullableNullThrows.php", line 25, characters 5-30:
Invalid assignment
File "NullableNullThrows.php", line 20, characters 13-15:
This is an int
File "NullableNullThrows.php", line 9, characters 32-35:
It is incompatible with a nullable type
In this particular case calling NullableNullThrows::foo() could actually return a null value instead of an int (given that the return type has been annotated has nullable). The member variable $x in NullableNullThrowsTest is not a nullable type. Thus, trying to assign a nullable int to a non-nullable int will produce the above Hack type error.
There are legitimate times when there is an expectation that a variable will not be null, and, if that variable is null, a runtime exception should be thrown. For these cases, use a library function such as nullthrows(). The nullthrows()definition is something like:
<?hhfunction nullthrows<T>(?T $x, ?string $message null): {
  if (
$x === null) {
    throw new 
Exception($message ?: 'Unexpected null');
  }

  return 
$x;
}
nullthrows() checks if the value associated with a type T is null. If the value is null, an exception is thrown. If not, the value is returned unmodified. An optional exception message can be supplied to nullthrows() as well. The code above can be rewritten to use nullthrows(), making the Hack type checker happy:
<?hh
 
class NullableNullThrows {
  public function 
foo(int $y): ?int {
    if (
$y 4) {
      return 
$y 5;
    } else {
      return 
null;
    }
  }
}
 
class 
NullableNullThrowsTest {
  protected 
int $x;
 
  public function 
__construct() {
    
$nnt = new NullableNullThrows();
    
$nullable_value $nnt->foo(2);
    
$this->nullthrows($nullable_value);
  }
}
 
function 
main_nnt() {
  
$nntt = new NullableNullThrowsTest();
}
 
main_nnt();

Uninitialized Member Variables 

Take this piece of code and associated Hack error message:
<?hh
 
class PettingZoo {
  private 
BunnyRabbit $fluffy;
 
  public function 
pet(): void {
    
$this->fluffy get_bunny();
    
pet_bunny($this->fluffy);
  }
}
The above example will output:
File "PettingZoo.php", line 4, characters 7-16:
The class member fluffy is not always properly initialized
Make sure you systematically set $this->fluffy when the method __construct is called
Alternatively, you can define the type as optional (?...)
Hack would like developers to systematically initialize member variables. By doing so, it can be proven that these member variables are not null. It is fine for member variables to sometimes be null, but this must be explicitly stated by using a nullable type. Then Hack can be aware that null checks should be done on the member variable. Thus, there are two ways to suppress the above Hack error:
  • Always initialize member variables: Make sure the member variable is always set when the class is created. This can be done with a constructor or with inline values for primitives (e.g., private int $myInt = 123). Why? It is safe to assume that the constructor is always called when the object is created; therefore Hack can always assume that the member variable has been set.
  • Use Nullable: Make the member variable nullable and explicitly deal with what should happen if it turns out to benull. Why? Since the member variable is nullable, there is no need to make sure it is always initialized. Instead, logic will be added to deal with when it is null.
<?hh
 
class PettingZoo {
  private 
BunnyRabbit $fluffy;
 
  public function 
__construct() {
    
$this->fluffy get_bunny();
  }
 
  public function 
pet(): void {
    
pet_bunny($this->fluffy);
  }
}
<?hh
 
class PettingZoo {
  private ?
BunnyRabbit $fluffy;
 
  public function 
getFluffy(): BunnyRabbit {
    
// A local variable is being used due to the
    // nullable member variables issue
    
$fluffy $this->fluffy;
    if (!
$fluffy) {
      
$fluffy $this->fluffy get_bunny();
    }
    return 
$fluffy;
  }
 
  public function 
pet(): void {
    
pet_bunny($this->getFluffy());
  }
}

Uninitialized Member Variables and No Constructor 

Take this scenario:
File1.php
<?php
 
class InitializeMemberVariables {
 
  
// Parent class constructor. Uses setup() implemented in children to
  // initialize member variables.
  
public final function __construct() {
    
var_dump(func_get_args());
    
var_dump($this);
    
call_user_func_array(array($this'setup'), func_get_args());
  }
 
  
// Children override this method
  
protected function setup() {}
}
File2.php
<?hh
 
class IMVChild extends InitializeMemberVariables {
  protected ?
int $i;
 
  
// Parent constructor called, setup called from there and dispatched to here
  // to initialize member variables
  
protected function setup(int $i) {
    
$this->$i;
    
var_dump($this);
  }
}
 
class 
IMVChild2 extends InitializeMemberVariables {
 
  protected 
string $s;
  protected 
bool $b;
  protected 
int $i;
 
  
// Parent constructor called, setup called from there and dispatched to here
  // to initialize member variables
  
protected function setup(string $sbool $bint $i) {
    
$this->$s;
    
$this->$b;
    
$this->$i;
    
var_dump($this);
  }
 
}
 
$imvc = new IMVChild(4);$imvc2 = new IMVChild2("Hi"true3);
The above example will output:
File "IMVChild.php", line 31, characters 7-15:
The class member b is not always properly initialized
Make sure you systematically set $this->b when the method __construct is called
Alternatively, you can define the type as optional (?...)
By a reading of the code, the class member variables are indeed being initialized. However, Hack doesn't support this initialization paradigm in the type checker. As it stands, there is no way for the Hack type checker to infer that the class members are being initialized through a forwarded call to setup() from call_user_func_array() in the__construct(). What are the options to avoid these Hack errors (noting that something like // UNSAFE will not work in this situation)? Here are the possible options:
  1. Redesign/Refactor the code.
  2. Make all member variables nullable.
  3. This is a Hack problem that needs fixing. The type of pattern is not recognized by the Hack type checker.
Without understanding fully the prevalence of this type of design pattern and the side effects supporting it in Hack might have, options (1) and (2) should be examined before jumping in to assume option (3) must be the answer. Is there any way to redesign or refactor the code to avoid calling late-bound methods from the parent constructor? With this current structure, there are two possible problems. One is that even if Hack supported the paradigm, there is another Hack error looming regarding overriding setup() with a different number of parameters than the parent. This error will occur when the parent code is converted to <?hh. Secondly, using call_user_func_array to initialize variables may not necessarily be supported in Hack moving forward. Even if this design had to be maintained for legacy reasons, finding another way to call setup() might be preferred. Is there any way to use nullable on the class members (i.e. ?)? This may not be ideal when it is known that a member variable may never be null, but the consideration should at least be made whether it is a nice stopgap solution until a better design can be found.






No comments:

Post a Comment