Top

I’ve been sitting in on Bill Labov’s seminar on dialect geography this semester. One of the things Bill has emphasized to the students in the class is the importance of drawing isoglosses, lines that separate linguistic regions from each other. Traditionally, isoglosses are drawn by hand according to the investigator’s subjective preferences. In the ANAE, Bill and his co-authors introduced a decision procedure for including points in an isogloss region. But it would not be correct to call this an algorithm – important decisions about where to place the drawn lines are left up to the (human) implementor of the procedure.

By way of illustration, here’s an example from the precursor to the ANAE, showing the cot~caught merger (the tendency to pronounce both of those words identically):

As part of my own learning in the class, I decided to take up the question of the algorithmic drawing of isoglosses.

Background

I decided on the Voronoi tessellation as the basis of an isogloss algorithm. This is a fancy name for a very simple concept: each point on the map has a region of influence, defined as the region which is closer to that point than to any other. Below is the Voronoi tessellation of the points in the US sampled in the ANAE

The Delaunay triangulation is the dual (roughly, math-speak for “opposite”) of the Voronoi tessellation. The basic definition is not as simple as its Voronoi counterpart (it involves maximizing angles), but it can be described as “for any two points whose Voronoi regions (= regions of influence) share a border, connect them with a line.” Here’s the Delauany triangulation of the same set of points:

I decided that the regions of the Voronoi tessellation would form the basis of my isoglosses.

Network structure

We can’t just draw lines between every Voronoi cell whose center differs in the value of the variable of interest. We also need to discard noise, for example a divergent point in the middle of a large region (such as the few blue points in MT and CA in the map above). To do this, I took the Delaunay triangulation and treated it as a network. There’s a large body of work on how to carve networks (graphs) up in various ways: separating out unconnected sub-graphs, pruning weak links, etc. I’ve really only scratched the surface by doing clique detection (and small-clique elimination).

Implementation

I implemented this approach in R. I used the deldir package for the tessellation, and the igraph package for graph operations. I decided to package the result as a Shiny webapp. I wanted the tool to be easy to use for my classmates, who are working on papers about various dialect geography phenomena and should probably be worrying about that rather than the right arguments to pass to an R function. And I’ve been looking for a good excuse to learn about Shiny.

The code is on Github, together with instructions to get it running on your machine. If you have geospatial dialect data I encourage you to try the code out and let me know about your experiences.

Implementation tricks

In making implementing this approach, I came up with a few clever hacks I’d like to share.

clipping lines
By default, Voronoi lines at the edges of the map trail off towards infinity, which is not what you want an isogloss to do. I decided to use geographical/political boundaries to truncate the isogloss lines. The functions for this are provided by the rgeos package. I take the political boundary polygons and inflate them, allowing isoglosses to continue into a small buffer zone beyond the political boundaries. Then I trim the lines that extend beyond this buffer region.
input to Shiny

Sadly, Shiny doesn’t seem to offer a good way of providing arguments to apps. I’ve created a Github issue to inquire about the possibility of implementing this. In the meantime, there’s a bit of grim hackery with .GlobalEnv to find the desired dataset.

dynamic behavior in Shiny
I was able to implement a pretty nifty heuristic for automatically finding the latitude and longitude columns in the webapp. For most datasets (hopefully) once you’ve selected the name for the data frame, the latitude and longitude column names should fill in immediately. The trick to doing this is not prominently advertised in the Shiny docs, so if you’re interested check out the call to observe around line 20 of server.R.

Results

What do the results look like? Well, here’s a map of the same isogloss as above:

The match is pretty close, I’d say. We can also apply some smoothing to the lines to reduce the number of zig-zags:

Conclusions and future work

A question that I think has been swirling around this work has been, “can automatically drawn isoglosses replace those drawn by humans?” I’m not sure this is a good question to ask. Isogloss maps are not deductive evidence for a particular conclusion.

Rather, they serve as illustrations of a point, or ways of telling a particular story about a situation. As in history and the social sciences, deductive evidence and storytelling operate together to advance conclusions.

I think the uneasiness about hand-drawn maps comes from a worry that humans will somehow (perhaps without even meaning to) “cook the books” in order to produce evidence that concords with their own hunches rather than the truth. I don’t think automatically drawn isoglosses can fix this problem, just as the use of statistical p-values doesn’t magically convert all the science it touches into objective truth.

What automatically drawn isoglosses can do, however, is to provide another kind of evidence about the spatial distribution of linguistic variation. Investigators can use the output of this algorithm along with traditional (hand-drawn isogloss) and new (spatial autocorrelation) techniques to advance their understanding of phenomena.

Future work

I’d like to end this post with some ideas about future work on improvements to the algorithm.

  • This paper by Duckham et al. provides a Delauany-based algorithm for drawing shapes which could be a useful refinement
  • One of the references of that paper (#7, by Alani et al.) suggests adding a ring of points around the area of interest to avoid the problem of Voronoi lines wandering off towards the edge of the map. This would replace the current clipping algorithm. It would also help prevent some odd problems with the current approach, such as the isogloss which wanders over western TX, NM, and AZ in the map above (which is preserved by the small-clique elimination pass because it is continuous, outside the viewable region, with the isogloss separating the western US from the rest of the map).
  • It would be good to allow isoglosses for multiple features to be drawn on the same map in the interactive interface, and to allow the piecewise deletion of undesired isogloss segments (like the western TX one above)
  • It would also be nice to package this code as a ggplot primitive, so that an isogloss could be added to a ggplot map simply by appending + geom_isogloss(...) to a plot creation command. There are some not-quite-working experiments with this in the ggplot.R file of the Github repo.