The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive.
Live demos of the visualizations can be found on the Facets project description page.
Overview gives a high-level view of one or more data sets. It produces a visual feature-by-feature statistical analysis, and can also be used to compare statistics across two or more data sets. The tool can process both numeric and string features, including multiple instances of a number or string per feature.
Overview can help uncover issues with datasets, including the following:
- Unexpected feature values
- Missing feature values for a large number of examples
- Training/serving skew
- Training/test/validation set skew
Key aspects of the visualization are outlier detection and distribution comparison across multiple datasets. Interesting values (such as a high proportion of missing data, or very different distributions of a feature across multiple datasets) are highlighted in red. Features can be sorted by values of interest such as the number of missing values or the skew between the different datasets.
Details about Overview usage can be found in its README.
Dive is a tool for interactively exploring up to tens of thousands of multidimensional data points, allowing users to seamlessly switch between a high-level overview and low-level details. Each example is a represented as single item in the visualization and the points can be positioned by faceting/bucketing in multiple dimensions by their feature values. Combining smooth animation and zooming with faceting and filtering, Dive makes it easy to spot patterns and outliers in complex data sets.
Details about Dive usage can be found in its README.
Usage in Google Colabratory/Jupyter Notebooks
Note that for using Facets Overview in a Jupyter notebook, there are two considerations:
- In the notebook, you will need to change the path that the Facets Overview python code is loaded from to the correct path given where your notebook kernel is run from.
- You must also have the Protocol Buffers python runtime library installed: https://github.com/google/protobuf/tree/master/python. If you used pip or anaconda to install Jupyter, you can use the same tool to install the runtime library.
When visualizing a large amount of data in Dive in a Juypter notebook, as is done in the Dive demo Jupyter notebook, you will need to start the notebook server with an increased IOPub data rate.
This can be done with the command
jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000.
git clone https://github.com/PAIR-code/facets cd facets
Building the Visualizations
If you make code changes to the visualization and would like to rebuild them, follow these directions:
- Install bazel: https://bazel.build/
- Build the visualizations:
bazel build facets:facets_jupyter(run from the facets top-level directory)
Using the rebuilt Visualizations in a Jupyter notebook
If you want to use the visualizations you built locally in a Jupyter notebook, follow these directions:
- Move the resulting vulcanized html file from the build step into the facets-dist directory.
- Install the visualizations into Jupyter as an nbextension.
- If jupyter was installed with pip, you can use
jupyter nbextension install facets-dist/if jupyter was installed system-wide or
jupyter nbextension install facets-dist/ --userif installed per-user (run from the facets top-level directory). You do not need to run any follow-up
jupyter nbextension enablecommand for this extension.
- Alternatively, you can manually install the nbextension by finding your jupyter installation's
share/jupyter/nbextensionsfolder and copying the facets-dist directory into it.
- In the notebook cell's HTML link tag that loads the built facets html, load from
/nbextensions/facets-dist/facets-jupyter.html, which is the locally installed facets distribution. from the previous step.
- The Facets visualizations currently work only in Chrome - Issue 9.
Disclaimer: This is not an official Google product