Pages

Tuesday, March 25, 2014

Type Annotations

Table of Contents 

  • Introductory Example
  • Why?
  • What types can be used?
  • How to type annotate
  • Examples
  • Type Casting
  • Annotating Arrays
  • Annotating Closures
  • Annotating Constructors
  • Annotating with this
  • Mixed Types
  • Passing By Reference
  • Typing Generators
  • Summary
Type annotations allow for PHP code to be explicitly typed on parameters, class member variables and return values (types are inferred for locals). These annotated types are checked via a type checker. Here are examples of the same code without and with type annotations:
<?phpclass MyClass {
  const 
MyConst 0;

  private 
$x;

  public function 
increment($x) {
    
$y $x 1;
    return 
$y;
  }
}
<?hhclass MyClass {
  const 
int MyConst 0;

  private 
string $x '';

  public function 
increment(int $x): int {
    
$y $x 1;
    return 
$y;
  }
}
It is clear that the second example provides more description and insight into the intention of the code. Type annotations provides three primary code improvements:
  • Readability by helping other developers understand the purpose and intention of the code. Many use comments to express such annotations. Hack formalizes such annotations.
  • Correctness by forbidding unsafe coding practices (e.g., sketchy null checks) as well as allowing tools to check type annotations before runtime.
  • Refactorability by allowing Hack to inherit the reliable and automatic refactoring of a statically typed language. It is quite difficult to refactor a dynamically typed language such as PHP. Renaming or changing the number of parameters to a function require a manual search of the code base to find all call sites. A type checker, however, will throw and display a list of errors when a "breaking" change is made that can then be fixed one-by-one (or automatically with tooling).

    Introductory Example 

    <?hhclass AnnotatedClass {
      public 
    int $x;
      private 
    string $s;
      protected array 
    $arr;
      public 
    AnotherClass $ac;

      function 
    bar(string $strbool $b): float {
        if (
    $b && $str === "Hi") {
           return 
    3.2;
        }
        return 
    0.3;
      }
    }
    What should be noticeably different from PHP is that AnnotatedClass has type information on all member variables (whether public, private or protected), as well as its function parameters and return types. The annotated types in this example are:
    • int for $x.
    • string for $s.
    • array for $arr.
    • AnotherClass for $ac.
    • string for $str in bar().
    • bool for $b in bar().
    • float as the return type of bar().

      Why? 

      Normally dynamically typed languages allow variables to take on any type representation and allow this type representation to be changed on-the-fly. So, in PHP, a variable $x can be assigned to an int and then, down the line, be assigned to a string ... all in the same local scope. In other words, not only is the value of a variable mutable, the type of a variable is mutable. This ability allows for rapid prototyping, more concise code, and a lot of flexibility. Dynamic typing also comes with a cost. Errors are only caught at runtime. There is no compile-time analysis for code optimization. (Virtual machines like HHVM can mitigate the optimization disadvantage of a dynamically typed language by providing an intermediate translation step before runtime, the time which can be used for optimizations.) Bugs can go undetected for years and rear their ugly head when, for example, a call to a method with an unexpected parameter type is made.
      "Wait a second! Facebook.com, with its billion+ users, is written with tens of millions of lines of PHP! A dynamic language worked well for Facebook."
      Yes, Facebook has done quite well with dynamically typed PHP. However, it is possible to bring some statically typed language features to PHP without affecting functionality and performance. With statically typed languages, errors can be caught before runtime. Code becomes more readable and self-explaining. The likelihood of the rogue "calling a method with an unexpected parameter type" bug becomes very small as these are caught before execution.
      Hack helps bridge the gap between dynamically and statically typed languages by providing features normally seen in statically typed languages to PHP. A primary goal of Hack was to bring these features but remain as compatible as possible with current PHP codebases. Type annotations is a big step toward accomplishing this goal.

      What types can be used? 

      With Hack, most every PHP type can be used for type annotations. These types can be used to annotate function arguments, return types or member variables.
      • Primitive, basic types: int, float, string, bool, array (However, do not use the aliases double, integer,boolean, real)
      • User-defined classes: Foo, Vector<some type>
      • Mixed: mixed
      • Void: void
      • Nullable or optional: ?someType (e.g., ?int, ?bool)
      • Typed arrays: array<Foo>, array<string, array<string, Foo>>
      • Tuples: tuple(type1, type2, ....) (e.g., tuple(string, int) )
      • XHP elements :x:frag, :x:base, :div, and the catch-all :xhp.
      • generics: Foo<T>
      • closures: (function(type1, type2, ...): return_type)
      • resources: resource
      The array type may only be used in the Hack default (i.e., // partial) or // decl modes. When using arrays in// strict mode, Hack will throw an error about using a collection class such as a Vector or Map.

      How to type annotate 

      Hack decided to annotate the return types at the end of a function/method declaration instead of near the beginning found in languages like C#. This was done mainly for readability purposes. But there are other reasons to have them positioned the way they are because of closures and searchability. With respect to closures, if the return type is annotated at the beginning of a function, PHP could interpret the return type to be a constant string, thus ignoring the return type altogether. With respect to searchability, searching for "function foo" will produce more useable results than having to use wildcards or some other mechanism to find all the functions named foo(), regardless of return type.
      Here is a matrix of many of the types available for Hack. This matrix shows how the types are defined and used in a class member, parameter and return type scenario.
      Type
      Definition
      Example Class Member Usage
      Example Parameter Usage
      Example Return Usage
      Boolean
      bool
      bool $b = false;
      function foo(bool $b)
      : bool
      Integer
      int
      int $i = 3;
      function foo(int $i)
      : int
      Float
      float
      float $f = 3.14;
      function foo(float $f)
      : float
      String
      string
      string $s = "Hello";
      function foo(string $s)
      : string
      Untyped Array (partial mode only)
      array
      array $x = array();
      function foo(array $arr)
      : array
      Array-as-vector
      array<someType>
      array<string> $arrs = array("hi", "bye");
      function foo(array<string> $arrs)
      : array<string>
      Array-as-map
      array<keyType, valueType>
      array<int, string> $arrs = array(42 => "answer");
      function foo(array<int, string> $arrs)
      : array<int, string>
      Generic Type
      NonPrimitiveType<T>
      T $t;
      function foo(T $t) or function foo<T>(T $t)
      : T
      Vector
      Vector<T>
      protected Vector<int> $vec = Vector {3, 4};
      function foo(Vector<int> $vec)
      : Vector<int>
      Map
      Map<Tk, Tv>
      protected Map<string, int> $map = Map {"A" => 1, "B" => 2};
      function foo(Map<string, int> $map)
      : Map<string, int>
      Set
      Set<Tv>
      protected Set<int> $set = Set{1,2};
      function foo(Set<int> $set)
      : Set<int>
      Pair
      Pair<Tv1, Tv2>
      protected Pair<int, string> $pair = Pair {7, 'a'};
      function foo(Pair<int, string> $pair)
      : Pair<int, string>
      User Object
      FooClass
      protected FooClass $a;
      function foo(FooClass $a)
      : FooClass
      Void
      void
      N/A
      N/A
      : void
      Mixed Type
      mixed
      protected mixed $m = 3;
      function foo(mixed $m)
      : mixed
      Nullable
      ?someType
      protected ?int $ni = null;
      function foo(?int $ni)
      : ?int
      Tuple
      tuple(type1, type2)
      protected (string, int) $tup = tuple("4", 4);
      function foo((string, int) $tup)
      : (string, int)
      Closure
      (function(type1, type2, …): returnType)
      protected (function(int, int): string) $x;
      function foo((function(int, int): string) $x)
      : (function(int, int): string)
      Resource
      resource
      $r = fopen('/dev/null', 'r');
      function foo(resource $r)
      : resource
      Most of the time, initialization of the class members will be done in a constructor (i.e., __construct()). For brevity, most of the initialization in the table above was done inline. However, sometimes brevity doesn't work very well. For the generic type, user object, nullable and closure class members, here are example initializations in the constructor:
      <?hhclass FooClass{}

      class 
      MyClass {
        
      T $t;
        
      FooClass $a
        
      ?int $ni;
        (function(
      intint): string$x;

        public function 
      __construct() {
          
      $this->$val;
          
      $this->= new FooClass();
          
      $this->ni $val === null 4;
          
      $this->= function(int $nint $m): string {
            
      $r '';
            for (
      $i=0$i $n+$m$i++) {
              
      $r .= "hi";
            }
            return 
      $r;
          };
        }
      }

      Examples 

      Below are some basic, contrived examples using some of the above types within the context of the Hack type annotation framework:
      Annotating With Basic Types
      <?hhfunction increment(int $x): int {
        return 
      $x 1;
      }

      function 
      average(float $xfloat $y): float {
        return (
      $x $y) / 2;
      }

      function 
      say_hello(string $name): string {
        return 
      "Hello ".$name;
      }

      function 
      invert(bool $b): bool {
        if (
      $b) {
          return 
      false;
        } else {
          return 
      true;
        }
      }

      function 
      sort(array $arr): array {
        
      sort($arr);
        return 
      $arr;
      }
      // A piece of code that computes the average of three numbersfunction avg(int $n1int $n2int $n3): float {
        
      $s $n1 $n2 $n3;
        return 
      $s 3.0;
      }
      Annotating with void
      <?hh
      // void is used to indicate that a function does not return anything.function say_hello(): void {
        echo 
      "hello world";
      }
      Annotating with Nullable
      <?hh
      // The nullable type is used to indicate that a parameter can be null.
      // It is also useful as a return type, where the error case returns null.
      // The type checker will force you to handle the null case explicitly.
      function f1(int $x): ?string {
        if (
      $x == 0) {
          return 
      null;
        }
        return 
      "hi";
      }

      function 
      f2(int $x): void {
        
      $y f1($x);
        
      // $y here has a type of ?string
        
      if ($y !== null) {
          
      // $y can be used as an string. No casts required.
        
      }
      }
      Annotating with mixed
      <?hh
      // The mixed type should be used for function parameters where the behavior depends on the type.
      // The code is forced to check the type of the parameter before using it
      function encode(mixed $x): string {
        if (
      is_int($x)) {
          return 
      "i:".($x 1);
        } else if (
      is_string($x)) {
          return 
      "s:".$x;
        } else {
          ...
        }
      }
      Annotating Classes
      <?hh
      class {}

      function 
      foo(A $x): void {
        ...
      }

      function 
      sum(Vector<int$arr): int {
        
      $s 0;
        foreach (
      $arr as $v) {
          
      $s += $v;
        }
        return 
      $s;
      }
      Annotating Tuples
      <?hhclass TupleTest {
        
      // This is a Vector of tuples. Notice how the "tuple" reserved
        // word is not used when annotating.
        
      private Vector<(stringstring)> $test Vector {};
        
      // The return type is a tuple. Again, the "tuple" reserved
        // word is not used.
        
      public function bar(): (stringstring) {
          return 
      $this->test[0];
        }
        public function 
      foo() {
          
      // But to use an actual tuple, use the "tuple" reserved word
          
      $this->test->add(tuple('hello''world'));
        }
      }
      Annotating Resources
      <?hhfunction f1(): ?resource {
        
      // UNSAFE
        
      return fopen('/dev/null''r');
      }

      function 
      f2(resource $x): void {
      }

      function 
      f3(): void {
        
      $x f1();
        if (
      is_resource($x)) {
          
      f2($x);
        }
      }

      Type Casting 

      HHVM allows type casting, basically allowing for a variable to be cast to another, appropriate type (e.g., int to bool). Hack allows type casts as well. For example, the type checker will give no errors for this type of code.
      <?hhfunction foo(): bool {
        
      $foo 10;   // $foo is an integer
        
      $bar = (bool) $foo;   // $bar is a boolean
        
      return $bar;
      }
      foo();
      That said, there are types that are synonyms for other types (e.g., double for float). Hack generally disallows this. For consistency purposes, Hack allows one type for one meaning:
      Allowed
      Not Allowed
      float
      double, real
      bool
      boolean
      int
      integer

      binary
      Here is an example of how the type checker will throw an error if you try to use a synonym of a type that is not supported.
      <?hhfunction foo(): bool {
        
      $foo 10;   // $foo is an integer
        
      $bar = (boolean) $foo;   // $bar is a boolean
        
      return $bar;
      }
      foo();
      The above example will output:
      File "test.php", line 4, characters 11-17:
      Invalid Hack type. Using "boolean" in Hack is considered an error. Use "bool" instead, to keep the codebase consistent.

      Annotating Arrays 

      Annotating arrays deserves a bit more of a mention. Arrays in Hack can take the following forms:
      • Untyped array (partial mode only): array
      • Explicitly typed array with integer keys: array<someType>
      • Explicitly typed array with string or integer keys: array<int, someType> or array<string, someType>
      Here is an example of how various arrays are annotated. Remember that, in Hack, the use of arrays are more restricted in // strict mode.
      <?hh
      class FooFoo {}
      class 
      HackArrayAnnotations {
        private array<
      FooFoo$arr;
        private array<
      stringFooFoo$arr2;

        public function 
      __construct() {
          
      $this->arr = array();
          
      $this->arr2 = array();
        }

        public function 
      bar<T>(T $val): array<T> {
          return array(
      $val);
        }

        public function 
      sort(array<intfloat$a): array<intfloat> {
          
      sort($a);
          return 
      $a;
        }

        public function 
      baz(FooFoo $val): array<FooFoo> {
          
      $this->arr[] = $val;
          return 
      $this->arr;
        }
      }

      function 
      main_aa() {
        
      $haa = new HackArrayAnnotations();
        
      var_dump($haa->bar(3));
        
      var_dump($haa->bar(new FooFoo()));
        
      var_dump($haa->sort(array(1.35.62.30.21.4)));
        
      var_dump($haa->baz(new FooFoo()));
      }
      main_aa();
      The above example will output:
      array(1) {
        [0]=>
        int(3)
      }
      array(1) {
        [0]=>
        object(FooFoo)#2 (0) {
        }
      }
      array(5) {
        [0]=>
        float(0.2)
        [1]=>
        float(1.3)
        [2]=>
        float(1.4)
        [3]=>
        float(2.3)
        [4]=>
        float(5.6)
      }
      array(1) {
        [0]=>
        object(FooFoo)#2 (0) {
        }
      }
      
      Examining a typed array a bit more...
      <?hh
      class BarBar {}

      class 
      ABCD {
        private array<
      BarBar$arr;
        private 
      int $i;

        public function 
      __construct() {
          
      $this->arr = array(new BarBar());
          
      $this->4;
        }
        public function 
      getBars(): array<BarBar> {
          if (
      $this->5) {
            return array();
          } else if (
      $this->10) {
            return 
      $this->arr;
          } else {
            return array(
      null); // Type Error
          
      }
        }
      }
      An empty array can be returned from a method that is annotated to return a typed array. However, an array with the first element null is not compatible. In order to make that work, a nullable typed array must be used as the annotation (e.g.,: array<?BarBar>).

      Annotating Closures 

      Annotating closures and callables require their own callout beyond the brief summary above about using PHP types with Hack. Take this unannotated, non-Hack PHP code that uses a closure:
      <?php
      function foo_closure($adder_str) {
        return function(
      $to_str) use ($adder_str) {
          return 
      strlen($to_str) + strlen($adder_str);
        };
      }

      function 
      main_closure_example() {
        
      $hello foo_closure("Hello");
        
      $facebook foo_closure("Facebook");
        
      $fox foo_closure("Fox");

        echo 
      $hello("World") . "\n";
        echo 
      $facebook("World") . "\n";
        echo 
      $fox("World") . "\n";
      }
      main_closure_example();
      How is the function foo_closure() and the closure function actually annotated? Here is the proper Hack type annotation for such a function:
      <?hh
      function foo_closure(string $adder_str): (function (string): int) {
        return function(
      $to_str) use ($adder_str) {
          return 
      strlen($to_str) + strlen($adder_str);
        };
      }

      function 
      main_closure_example() {
        
      $hello foo_closure("Hello");
        
      $facebook foo_closure("Facebook");
        
      $fox foo_closure("Fox");

        echo 
      $hello("World") . "\n";
        echo 
      $facebook("World") . "\n";
        echo 
      $fox("World") . "\n";
      }
      main_closure_example();
      Note:
      The return type annotation of foo_closure() is actually a skeleton signature of the actual closure being returned. Thus, for example, trying to return true from the closure will throw a Hack error since the return type annotation clearly specifies that the closure returns an int. Note that the actual closure is not type annotated, nor are any useparameters part of the type annotation for a closure.
      The same style of annotating closures are used in function/method parameters (if the function/method takes a closure as a parameter), as well as class member variables. Here is a final example:
      <?hh
      // Completely contrived
      function f1((function(intint): string$x): string {
        return 
      $x(2,3);
      }

      function 
      f2(): string {
        
      $c = function(int $nint $m): string {
          
      $r '';
          for (
      $i=0$i<$n+$m$i++) {
            
      $r .= "hi";
          }
          return 
      $r;
        };
        return 
      f1($c);
      }

      Annotating Constructors 

      With constructors, parameters are annotated as normal. It may also be tempting to annotate the return type of__construct() with : void. However, this is misleading (and technically incorrect). While there is no explicit returnstatement in __construct (a general hint that void is correct), the constructor actually does implicitly return the instantiated type for which __construct was called.
      Therefore, to avoid any confusion, do not annotate the return type of __construct. The Hack type checker will throw an error if there is an annotation present.

      Annotating with this 

      The this type is a pretty useful type, which you'll usually see as a return type. Here are some examples of it being used:
      <?hh
      class Base {
        private 
      int $x 0;
        public function 
      setX(int $new_x): this {
          
      $this->$new_x;
          
      // $this has type "this"
          
      return $this;
        }
        public static function 
      newInstance(): this {
          
      // new static() has type "this"
          
      return new static();
        }
        public function 
      newCopy(): this {
          
      // This would not typecheck with self::, but static:: is ok
          
      return static::newInstance();
        }
        
      // You can also say Awaitable<this>;
        
      public async function genThis(): Awaitable<this> {
          return 
      $this;
        }
      }

      final class 
      Child {
        public function 
      newChild(): this {
          
      // This is OK because Child is final.
          // However, if Grandchild extends Child, then this would be wrong, since
          // $grandchild->newChild() should returns a Child instead of a Grandchild
          
      return new Child();
        }
      }
      this is the type of $this and new static(). If Base::setX() returns this, that means that at callsites,$child->setX() is known to return an instance of Child.
      Here are some invalid uses of this:
      COUNTER EXAMPLES
      <?hh
      class Base {
        public static function 
      newBase(): this {
          
      // ERROR! The "this" return type means that $child->newBase()
          // should return a Child, but it always returns a Base!
          
      return new Base();
        }

        public static function 
      newBase2(): this {
          
      // ERROR! This is wrong for the same reason that new Base() is wrong
          
      return new self();
        }

        
      // This function is fine
        
      abstract public static function goodNewInstance(): this;

        public static function 
      badNewInstance(): this {
          
      // ERROR! Child::badNewInstance() would call Base::goodNewInstance() which is wrong
          
      return self::goodNewInstance();
        }
      }
      this can only be used in covariant locations, which means you cannot use this in as a function parameter typehint or as a member variable typehint. When Hack has proper covariance support, you will be able to use this to instantiate any covariant type variable, like Awaitable<this> and ImmVector<this>. Until then, you can only use this withAwaitable.
      At the moment, there is no return type that means $this and only $this. The this can be satisfied with any object with the same type as $this.

      Mixed Types 

      Sometimes the type of a function parameter or a return type can be "various". And this could be quite intentional. When confronted with this situation, there are two choices. One is to leave the type blank and let the Hack type checker assume the engineer knows what he/she is doing. The other is to use the PHP provided mechanism called mixed in order to have the type checker force the engineer to check the type before using it. The following example shows mixed being used as the parameter type and the subsequent needed if check.
      <?hhfunction sum(mixed $x): void {
        if (
      is_array($x) || $x instanceof Vector) {
          
      $s 0;
          foreach (
      $x as $v) {
            
      $s += $v;
          }
          return 
      $s;
        }
        
      //... do something else or throw an exception...}

      Passing By Reference 

      The Hack typechecker largely does not understand references and pretends that they do not exist. For example, the following code passes the typechecker:
      <?hh
      function swap(int &$xint &$y): void {
        
      $x $y
        
      $y 'boom';
      }

      function 
      main(): void {
        
      $x 1;
        
      $y 1;

        
      swap($x$y);

        
      var_dump($x);
        
      var_dump($y);

        
      swap($x$y);
      }
      main();
      It seems pretty clear that this shouldn't be done (trying to swap an int but writing in a string, when the swap()method takes two ints.). In fact, HHVM will balk at runtime when trying to execute this code. But getting an error at runtime defeats the whole purpose of the benefits of Hack.
      Hack allows passing primitives by reference in // partial mode only. Hack will not allow this in // strict mode. The only reason this is even allowed by Hack in // partial mode is to allow for easier migration of various PHP codebases.
      Why are these type of references bad? Consider the following PHP code snippet:
      <?phpfunction foo() {
        
      $arr = array(1,2,3,4);
        foreach (
      $arr as &$k) {
          echo 
      $k;
        }
        echo 
      "\n";
        
      $k 'foo';
        
      var_dump($arr);
      }
      foo();
      After the foreach completes, there will still be a dangling reference to the last element in the array. So, if a developer then writes$k = 'foo';, the array will be mutated ... behavior probably not desired.
      The above example will output:
      1234
      array(4) {
        [0]=>
        int(1)
        [1]=>
        int(2)
        [2]=>
        int(3)
        [3]=>
        &string(3) "foo"
      }
      
      Furthermore, references are just incredibly difficult to typecheck. They can arbitrarily change the types of the parameters passed into them at-a-distance. Being sound would require either looking into the called function (too expensive perf-wise) or completely forgetting the type of any reference parameters (too restrictive). Thus this compromise was reached.

      Typing Generators 

      Generators can be somewhat tricky to type check. For example:
      <?hhfunction gen() {
        yield 
      1;
        yield 
      "a";
      }
      Code using gen() must match the right type, in the right order. Static checking of this is not possible. Therefore, Hack implements two kinds of generators:
      Type
      Definition
      How to Yield
      Notes
      Continuation<T>
      The interface for items that are generators. They can be looped over.
      yield directly (e.g.,yield $foo;)
      Generators must always yield the same type. Continuations yield items of a type T. Continuations are used only withyield, not async/await, etc.
      Awaitable<T>
      The interface for items that can be prepared.
      In an asyncfunction, return or awaitdirectly
      The return type of async functions are all Awaitables. They can all be prepared using prep() and await. In this example, $foo is of type T for every expressionyield result($foo);. Not all Awaitables are Continuations.
      The following two examples are correctly typed:
      <?hhfunction gen(): Continuation<int> {
        yield 
      1;
        yield 
      2;
        yield 
      3;
      }

      function 
      foo(): void {
        foreach (
      gen() as $x) {
          echo 
      $x"\n";
        }
      }
      <?hh
      async 
      function f(): Awaitable<int> {
        return 
      42;
      }
      async function g(): Awaitable<string> {
        
      $f await f();
        
      $f++;
        return 
      'hi test ' $f;
      }

      Summary 

      Changing PHP code to Hack code has been purposely made simple. As possibly gleaned from the examples above, first change <?php to <?hh. Then annotate types or use other Hack features. Then run the type checker.
      Of course, developers must not be unnecessarily stymied when it comes to pushing out code. Hack implements type annotations in a way to not only be PHP compatible, but also to provide engineers ways to bypass the type checker. Some code might just be inherently dynamic in nature, code needs to be tested quickly, or the type checker could have a bug. In these cases there are options to have "unsafe" and other types of Hack code to get around the type checker's grip. These options shouldn't be used often as they defeat the overall purpose of having a reliable codebase, but they are there as needed.






No comments:

Post a Comment