For the last year or so, Yale’s DHLab has undertaken a series of experiments organized around analysis of visual culture. Some of those experiments have involved identifying similar images and visualizing patterns uncovered in this process. In this post, I wanted to discuss how we used the amazing Three.js library to build a WebGL-powered visualization that can display tens of thousands of images in an interactive 3D environment [click to enter]:
If you’re interested in creating something similar, feel free to check out the full code.
Getting Started with Three.js
To get started with Three.js, one needs to provide a bit of boilerplate code with the three essential elements of a Three.js page:
scene: The scene contains all objects to be rendered:
camera: The camera determines the position from which viewers see the scene:
renderer: The renderer renders the scene to a canvas element on an HTML page:
The code above creates the scene, adds a camera, and renders the canvas to the DOM. Now all we need to do is add some objects to the scene.
Each item rendered in a Three.js scene has a geometry and a material. Geometries use vertices (points) and faces (polygons described by vertices) to define the shape of an object, and materials use textures and colors to define the appearance of that shape. A geometry and a material can be combined into a mesh, which is a fully composed object ready to be added to a scene.
The example below uses the high-level BoxGeometry, which comes with pre-built vertices and faces:
Finally, to render the scene on the page, one must call the
render() method, passing in the scene and the camera as arguments:
Combining the snippets above gives the following result:
This is great, but the scene is static. To add some animation to the scene, one can periodically update the cube’s rotation property, then rerender the scene. To do so, one can replace the
renderer.render() line above with a render loop that calls itself recursively. Here is a standard render loop in Three.js:
Adding this block at the bottom of the script makes the cube slowly rotate:
Adding lights to the scene can make it easier to differentiate the faces of the cube. To add lights to the scene above, we’ll first want to change the cube’s material, because as the documentation says the MeshBasicMaterial is not affected by lights. Let’s replace the material defined above with a MeshPhongMaterial:
Next let’s point a light at the cube so that different faces of the cube catch different amounts of light:
Adding Images to a Scene
The snippets above give a quick overview of the core elements of a Three.js scene. The following section will build upon those ideas to create a TSNE map of images.
To build an image viewer, we’ll need to load some image files into some Three.js materials. We can do so by using the TextureLoader:
Now that the material is ready, the remaining steps are to generate a geometry from the image, combine the material and geoemtry into a mesh, and add the mesh to the scene, just like the cube example above. Because images are two-dimensional planes, we can use a simple PlaneGeometry for this object’s geometry:
The image will now appear in the scene:
It’s worth noting that one can swap out the PlanarGeometry for other geometries and Three.js will automatically wrap the material over the new geometry. The example below, for instance, swaps the PlanarGeometry for a more interesting Icosahedron geometry, and rotates the icosahedron inside the render loop:
This produces a strange looking cat indeed:
Building Custom Geometries
The examples above use a few different geometries built into Three.js. Those geometries are based on the fundamental THREE.Geometry class, which is a primitive geometry one can use to create custom geometries. THREE.Geometry is lower-level than the prebuilt geometries used above, but it gives performance gains that make it worth the effort.
Let’s create a custom geometry by calling the THREE.Geometry constructor, which takes no arguments:
This geometry object doesn’t do much yet, because it doesn’t have any vertices with which to ariculate a shape. Let’s add four vertices to the geometry, one for each corner of the image. Each vertex takes three arguments, which define the vertex’s x, y, and z positions respectively:
Now that the vertices are in place, we need to add some faces to the geometry. The code below will model an image as two triangle faces, as triangles are performant primitives in the WebGL world. The first triangle will combine the lower-left, lower-right, and upper-right vertices of the image, and the second will triangulate the lower-left, upper-right, and upper-left vertices of the image:
Awesome, we now have a geometry with four vertices that describe the corners of the image, and two faces that describe the lower-right and upper-left-hand triangles of the image. The next step is to describe which portions of the cat image should appear in each of the faces of the geometry. To do so, one must add some faceVertexUvs to the geometry, as faceVertexUvs indicate which portions of a texture should appear in which portions of a geometry.
FaceVertexUvs represent a texture as a two-dimensional plane that stretches from 0 to 1 in the x dimension and 0 to 1 in the y dimension. Within this coordinate system, 0,0 represents the bottom-left-most region of the texture, and 1,1 represents the top-right-most region of the texture. Given this coordinate system, we can map the lower-right triangle of the image to the first face created above, and we can map the upper-left triangle of the image to the second face created above:
With the uv coordinates in place, one can render the custom geometry within the scene just as above:
This may seem like a lot of work for the same result we achieve with a one-line PlanarGeometry declaration above. If a scene only required one image and nothing else, one could certainly use the PlanarGeometry and call it a day.
However, each mesh added to a Three.js scene necessitates an additional “draw call”, and each draw call requires the browser agent’s CPU to send all mesh related data to the browser agent’s GPU. These draw calls happen for each mesh during each animation frame, so if a scene is running at 60 frames per second, each mesh in that scene will require the transportation of data from the CPU to the GPU sixty times per second. In short, more draw calls means more work for the host device, so reducing the number of draw calls is essential if you want to keep animations smooth and close to sixty frames per second.
The upshot of all this is that a scene with tens of thousands of PlanarGeometry meshes will grind a browser to a halt. To render lots of images in a scene, it’s much more performant to use a custom geometry like the one above, and to push lots of vertices, faces, and vertex uvs into that geometry. We’ll explore this idea more below.
Displaying multiple images
Given the remarks above let’s next build a single geometry that contains multiple images. To do so, we’ll need to load a number of images into the page in which the scene is running. One way to accomplish this task is to pass a series of urls to the texture loader and load each image individually. The trouble with this approach is it requires one new HTTP request for each image to be loaded, and there are upper bounds to the number of HTTP requests a given browser can make to a given domain at a time.
A common solution to this problem is to load an “image atlas”, or montage of small images combined into a single larger image:
One can then use the montage the way that performance-minded sites like Google use spritesheets. If you have ImageMagick installed, you can create one of these montages with the
The last command will create an image atlas with 10 images per column and no padding between the images in the atlas. The sample directory
100-imgs.tar.gz contains 100 images, and the
-tile argument in the montage command indicates ouput atlas should have 10 columns, so the command above will generate a 10x10 grid of size 1280px by 1280px.
Let’s load the image atlas into a Three.js scene:
Once the image atlas is loaded in, we’ll want to create some helper objects that identify the size of the atlas and its sub images. Those helper objects can then be used to calculate the vertex uvs of each face in a geometry:
The custom geometry example above used four vertices and two faces to render a single image. To represent all 100 images from the image atlas, we can create four vertices and two faces for each of the 100 images to be displayed. Then we can associate the proper region of the image atlas material with each of the geometry’s 200 faces:
Rendering that scene produces a crazy little scatterplot of images:
Here we represent one hundred images with just a single mesh! This is much better than giving each image its own mesh, as it reduces the number of required draw calls by two orders of magnitude. It’s worth noting, however, that eventually one does need to create additional meshes. A number of graphics devices can only handle 2^16 vertices in a single mesh, so if you need your scene to run on a wide range of devices it’s best to ensure each mesh contains 65,536 or fewer vertices.
Using Multiple Atlas Files
Having discovered how to visualize multiple images with a single mesh, we can now scale up the image collection size dramatically.
One way to crank up the number of visualized images is to squeeze more images into the image atlas. As it turns out, however, the largest texture size supported by many devices is 2048 x 2048px, so the code below will stick to atlas files of that size.
For the examples below, I took roughly 20,480 images, resized each to 32px thumbs, then used the montage technique discuss above to build the following atlas files: 1, 2, 3, 4, 5. Once those atlas files are loaded onto a static file server, one can load each atlas into a scene with a simple loop:
The buildGeometry function will then create the vertices and faces for the 20,000 images within the atlas files. Once those are set, one can pump those geometries into some meshes and add the meshes to the scene (click the Codepen link for the full code update):
Positioning Images with TSNE Coordinates
So far we’ve used random coordinates to place images within a scene. Let’s now position images near other similar-looking images. To do so, we’ll create vectorized representations of each image, project those vectors down into a 2D embedding, then use each image’s position in the 2D coordinate space to position the image in the Three.js scene.
Generating TSNE Coordinates
First things first, let’s create a vector representation of each image. If you have tensorflow installed, you can create vectorized representations of each image in
100-imgs by running:
This script will generate one image vector for each image in
100-imgs/. We can then run the following script to create a 2D TSNE projection of those image vectors:
Running that TSNE script on your image vectors will generate a JSON file in which each input image is mapped to x and y coordinate values:
Positioning Images in a Three.js Scene
Given the JSON file with those TSNE coordinates, all we need to do is iterate over each item in the JSON file and position the image in that index position accordingly.
To load a JSON file using the Three.js library, we can use a FileLoader:
We can then use the index position of each item in that JSON file to identify the appropriate atlas file and x, y offsets for a given image. To do so, we’ll need to store each material by its index position:
buildGeometry() can then pass the index position of each atlas
i and the index position of each image within a given atlas
getCoords(), a function that returns the given image’s x and y coordinates:
The result is an interactive visualization of the images in a 2D TSNE projection:
See the Pen Three.js - Positioning Images with TSNE Coordinates by Douglas Duhaime (@duhaime) on CodePen.
We’ve now achieved a basic TSNE map with Three.js, but there’s much more that could be done to improve a user’s experience of the visualization. In particular, within the extant plot:
- * Users get no indication of load progress
* Users can't see details within the small images
* Users have no guide through the visualization
To see how our team resolved those challenges, feel free to visit the live site or the GitHub repository with the full source code. Otherwise, if you’re working on something similar, feel free to send me a note or a comment below–I’d love to see what you’re building.
I want to thank Cyril Diagne, a lead developer on the spectacular Google Arts Experiments TSNE viewer, for generously sharings ideas and optimization techniques that we used to build our own TSNE viewer.