Pages

Tuesday, March 25, 2014

Installation of Hack

Table of Contents 

  • Bootstrapping a Project
  • Converting an Existing Project to Hack

Bootstrapping a Project 

The static type system of Hack is enforced by a typechecker external to the HHVM runtime itself. It runs in the background, watching for file modifications and updating its state. Getting this static typechecker running requires a little bit of additional setup.

Running your project under HHVM 

The first step is to get HHVM running your project. You should have the latest version of HHVM installed, have a php.inior config.hdf set up if necessary, and HHVM running as a webserver serving your project.

Getting the typechecker running 

In the root of your project directory, create an empty file named .hhconfig. The typechecker uses this file to know what directory tree to typecheck without needing to specify paths to every command.
Now, run hh_client. This will start up the typechecking daemon, wait for it to finish checking the current directory tree, and report any errors that it found. Since your code is at this point likely all PHP and not Hack code, it should reportNo errors.
To test that it is working, try adding a file with these contents to your project:
<?hh
function f(): int {
  return 
'not an int';
}
The exact details of the Hack language and its type annotations are discussed in detail in later sections, but hopefully you see that this function contains a type error — it is annotated to return an int, but actually returns a string. If you re-runhh_client, it should report this type error. The typechecker updated its state in the background, and hh_clientretrieves that state.
At this point, you can start changing your <?php code into <?hh code, adding annotations, and the typechecker will verify more and more of your codebase as you convert! Some automated conversion tools are also available to help with the mechanical parts of this process.

Library functions 

hh_server does not ship with any innate knowledge about the PHP and Hack standard libraries, but instead imports this information from hhi files that describe the type structure of the standard library. HHVM ships with a default set of such files, available at /usr/share/hhvm/hack/hhi in the default binary distribution or at hphp/hack/hhi in the source distribution.
This directory contains information about the PHP standard library for the typechecker to consume. It should be copied into a location visible to the typechecker, but should not be ever executed at runtime or included in the autoloader. These files are effectively "interface" files that define types for parts of the standard library so the typechecker can verify they are used correctly. The runtime will produce an error if an hhi file is ever loaded at runtime, since it will redefine a function or class already built in to the runtime.

Included editor extensions 

The HHVM and Hack installation ships with bindings for common editors. vim users will find a pathogen package at/usr/share/hhvm/hack/vim with installation instructions in the README contained therein. emacs users will find a plugin inside /usr/share/hhvm/hack/emacs.
Although hh_client can be used directly on the command line, having the immediate feedback directly in an editor can be an amazing productivity boost. If your preferred editor or IDE is unsupported, contributions of additional plugins are very welcome; the existing plugins are decent examples of how to script the hh_client interface.

Notes on suggested project structure 

Hack and the Hack typechecker were designed with certain assumptions and best practices in mind, and may not work well for projects that significantly diverge from this.
The typechecker assumes that there is a global autoloader that can load any class on demand. This means that it insists that all class and function names are unique, and has no notion of checking imports or anything of that nature. Futhermore, it does not support conditional definitions of functions or classes — it must be able to statically know what is and what is not defined. It is of course perfectly possible to have a project that meets these requirements without a global autoloader, and the typechecker will work fine on such a project, but a project using an autoloader was the intended use case.
Mixing HTML and Hack code are not supported by the typechecker. Following and statically analyzing these complicated mode switches is unsupported, particularly since much modern code doesn't make use of this functionality. Hack code can output markup to the browser in a simple way via echo, or using a templating engine or XHP for more complex scenarios.

Converting an Existing Project to Hack 

Once you have a project running under HHVM and the typechecker, we have several automated conversion tools to help with different aspects of fully converting a project to Hack — especially the static type system. The suggested workflow is:
  1. Make as much PHP code visible to the typechecker as possible.
  2. Guess type annotations with a global inference tool, but log when they fail instead of failing hard.
  3. Parse error logs and remove annotations that do not match at runtime.
  4. Make the remaining annotations fail hard at runtime.

Moving individual files over to Hack 

HHVM ships with a tool called the hackificator that attempts to move as many files as possible into Hack. It does not change the code in the file itself over to use any new features of Hack; it just changes the file headers from <?php to<?hh in places where such a conversion can happen cleanly. (With one exception: it marks as nullable typehinted function parameters with a null default value.)

Running the hackificator 

To use the hackificator, first make sure you have properly gotten the typechecker running on your code and that it reports no errors. Then, run hackificator /path/to/your/project. The tool will attempt to convert your code, file by file, to the strictest mode that still produces no errors. This can take a while for projects with lots of files.

Thoughts on conversion process 

For simple projects, this automated conversion may get good coverage. For projects making use of PHP's more dynamic features that are unsupported in Hack, the coverage of the automated tool may not be good enough. Now is a good time to see what did and did not convert cleanly, and to see why the remaining files still in PHP mode did not convert cleanly. Facebook has found that it's often easier to address conversion problems right at the start; errors that are simple now have a way of propagating into inconsistencies that are more gnarly to untangle in the future.
In particular, you should consider the difference between a wide conversion and a deep conversion. To see the difference, consider the following three files:
<?php
abstract class WorkItem {
  public abstract function 
subclassDoWork();

  final public function 
beforeWork() {
    
// ...
  
}

  final public function 
run() {
    
$this->beforeWork();
    
$this->subclassDoWork();
  }

  
// ...}
<?php
final class WorkItemA extends WorkItem {
  final public function 
subclassDoWork() {
    
$this->foo 1;
  }

  
// ...}
<?php
final class WorkItemB extends WorkItem {
  final public function 
subclassDoWork() {
    
$this->bar true;
  }

  
// ...}
If all three files are converted to Hack files, then there will be a type error. WorkItemA and WorkItemB refer to undefined member variables, $this->foo and $this->bar respectively.
Other than manually diagonsing and fixing the error, there are two ways that automated tools like the hackificator can avoid producing this type error. First, it could move the WorkItem superclass into Hack, and leave WorkItemA andWorkItemB in PHP files. This is a deep conversion — since WorkItem itself was moved into a Hack file, the entire inheritance hierarchy of any other subclasses of WorkItem which also happen to convert cleanly will reside in Hack files. Since the entire inheritance hierarchy is visible, the typechecker can do much more aggressive checks against all converted subclasses of WorkItem; unconverted subclasses can be fixed and converted one-by-one, reaping all the benefits of static coverage once they have been moved over to Hack files.
The other way for an automated tool to reconcile this is to move WorkItemA and WorkItemB into Hack files, and leaveWorkItem itself unconverted in a PHP file. Since Hack can no longer see the entire inheritance hierarchy, it will assume that the undefined members are defined in a PHP superclass, and allow both WorkItem subclasses to typecheck with no errors. This is a wide conversion, since many more subclasses are now in Hack files than with the other approach, and can have many checks done against them. However, the checking done is considerably less complete than if the entire hierarchy is visible to the typechecker. Classes that were previously completely clean (even if WorkItem were in a Hack file too) can silently have some classes errors added to them; there is nothing enforcing that previously clean subclasses remain so.
This is a tradeoff that each project will have to decide to make one way or the other. The hackificator tends to do wide conversions instead of deep ones. Since it converts files one at a time, reverting a file if it introduces an error, it is much more likely to encounter and convert a broken subclass before it encounters the superclass. Automating a deep conversion is considerably trickier and is not currently implemented in the hackificator — though for small projects that want a deep conversion converting key superclasses by hand first is likely not an unreasonable approach.
Facebook did a wide conversion instead of a deep one. In our experience, having many classes "converted" but not fully checked due to a single key superclass remaining in a PHP file is a big deal. The WorkItem example above is actually a (dramatically) simplified example of this in our codebase. Our central "batched work item" superclass has over 25000 recursive subclasses, none of which can be fully checked until all of them are fully converted and the superclass moved into a Hack file — a herculean effort. We did many one-off hacks in order to get around this. For example, we defined aCrippleTypechecking trait that does nothing except live in a PHP file; this way we can move superclasses into Hack files and just include this trait in subclasses where errors are exposed.
Of course, since we didn't do a deep conversion, we don't know what the pitfalls of doing one would have been. Notably, even in the classes that aren't fully checked for things like undefined methods and undefined instance variables, the typechecker still can make many other classes of useful checks, so there is massive benefit even in a wide conversion.
Thus this is something each individual project should be aware of and consider whether a deep conversion, a wide conversion, or some hybrid is appropriate.

Inferring type annotations 

After moving as much code into Hack files as possible, that code is still largely going to be missing type annotations. We also provide a tool to attempt to infer parameter, return, and member variable types where possible. This inference is far from perfect. While it will always produce a set of types that are self-consistent and do not cause any type errors according to the typechecker, self-consistency does not necessarily mean that they will always align with reality. This is why all the types inserted by this inference engine are "soft" types. Instead of failing hard at runtime, like a normal type annotation, they will produce a log message and continue. These log messages can be used to find and remove incorrect annotations.
To add annotations, first move as much code into Hack files as possible — the more information the inference tool has access to, the better it will do, and the more consistency it can ensure. Then, runhh_server --convert directory-to-add-annotations project-root. This checks project-root for consistency, adding annotations only in the subdirectory directory-to-add-annotations while keeping the entire project clear of errors. Again, since the tool works best when it can see and modify as much of the code as possible, makingdirectory-to-add-annotations the same as project-root (or as close to it as possible) is likely to lead to the best results.
Since this inference process is holistic, it is considerably more resource intensive than the hackificator, which can operate on a single file at a time. It scales well enough to run on all of Facebook's library code all at once — tens of millions of lines of code — though it takes about half a day and 10 GB of RAM. Most projects will be of dramaticallysmaller scale than that and are expected to have no problems. For larger projects, RAM is more of a limiting factor than CPU, since parts of the process are unfortunately inherently serial. (Though more cores of course help and will be used when possible!)

Hardening type annotations 

Once annotations have been added, the logs from the "soft" failures can be automatically parsed to remove the annotations that mismatch at runtime, and to turn the annotations that match into hard failures. Good places to collect such logs include unit test runs and even from production.
To parse a logfile and remove the annotations it is complaining about, usehack_remove_soft_types --delete-from-log your-log.log. Keep in mind that this tool is fairly unintelligent; it uses regular expressions to grab certain key parts of the log, including the file path to modify. If your log was generated on a machine where its PHP code lives in a different location than your development environment, you may want to use sedor a similar tool to correct the paths before running hack_remove_soft_types.
Finally, once all annotations are known to be correct, hack_remove_soft_types --harden file.php will turn all annotations in that file into hard failures. This currently works file-at-a-time, so you may want something likefind dir -type f -name '*.php' -exec hack_remove_soft_types --harden '{}' ';' to harden every file in a directory.

 

No comments:

Post a Comment