I’ve been sitting in on Bill Labov’s seminar on dialect geography this semester. One of the things Bill has emphasized to the students in the class is the importance of drawing isoglosses, lines that separate linguistic regions from each other. Traditionally, isoglosses are drawn by hand according to the investigator’s subjective preferences. In the ANAE, Bill and his co-authors introduced a decision procedure for including points in an isogloss region. But it would not be correct to call this an algorithm – important decisions about where to place the drawn lines are left up to the (human) implementor of the procedure.
By way of illustration, here’s an example from the precursor to the ANAE, showing the cot~caught merger (the tendency to pronounce both of those words identically):
As part of my own learning in the class, I decided to take up the question of the algorithmic drawing of isoglosses.
I decided on the Voronoi tessellation as the basis of an isogloss algorithm. This is a fancy name for a very simple concept: each point on the map has a region of influence, defined as the region which is closer to that point than to any other. Below is the Voronoi tessellation of the points in the US sampled in the ANAE
The Delaunay triangulation is the dual (roughly, math-speak for “opposite”) of the Voronoi tessellation. The basic definition is not as simple as its Voronoi counterpart (it involves maximizing angles), but it can be described as “for any two points whose Voronoi regions (= regions of influence) share a border, connect them with a line.” Here’s the Delauany triangulation of the same set of points:
I decided that the regions of the Voronoi tessellation would form the basis of my isoglosses.
We can’t just draw lines between every Voronoi cell whose center differs in the value of the variable of interest. We also need to discard noise, for example a divergent point in the middle of a large region (such as the few blue points in MT and CA in the map above). To do this, I took the Delaunay triangulation and treated it as a network. There’s a large body of work on how to carve networks (graphs) up in various ways: separating out unconnected sub-graphs, pruning weak links, etc. I’ve really only scratched the surface by doing clique detection (and small-clique elimination).
I implemented this approach in R. I used the
deldir package for the
tessellation, and the
igraph package for graph operations. I decided
to package the result as a Shiny webapp. I wanted the tool to be easy
to use for my classmates, who are working on papers about various
dialect geography phenomena and should probably be worrying about that
rather than the right arguments to pass to an R function. And I’ve
been looking for a good excuse to learn about Shiny.
The code is on Github, together with instructions to get it running on your machine. If you have geospatial dialect data I encourage you to try the code out and let me know about your experiences.
In making implementing this approach, I came up with a few clever hacks I’d like to share.
- clipping lines
- By default, Voronoi lines at the edges of the map
trail off towards infinity, which is not what
you want an isogloss to do. I decided to use
geographical/political boundaries to truncate
the isogloss lines. The functions for this are
provided by the
rgeospackage. I take the political boundary polygons and inflate them, allowing isoglosses to continue into a small buffer zone beyond the political boundaries. Then I trim the lines that extend beyond this buffer region.
- input to Shiny
Sadly, Shiny doesn’t seem to offer a good way of providing arguments to apps. I’ve created a Github issue to inquire about the possibility of implementing this. In the meantime, there’s a bit of grim hackery with
.GlobalEnvto find the desired dataset.
- dynamic behavior in Shiny
- I was able to implement a pretty nifty
heuristic for automatically finding the latitude and longitude
columns in the webapp. For most datasets (hopefully) once you’ve
selected the name for the data frame, the latitude and longitude
column names should fill in immediately. The trick to doing this
is not prominently advertised in the Shiny docs, so if you’re
interested check out the call to
observearound line 20 of
What do the results look like? Well, here’s a map of the same isogloss as above:
The match is pretty close, I’d say. We can also apply some smoothing to the lines to reduce the number of zig-zags:
A question that I think has been swirling around this work has been, “can automatically drawn isoglosses replace those drawn by humans?” I’m not sure this is a good question to ask. Isogloss maps are not deductive evidence for a particular conclusion.
Rather, they serve as illustrations of a point, or ways of telling a particular story about a situation. As in history and the social sciences, deductive evidence and storytelling operate together to advance conclusions.
I think the uneasiness about hand-drawn maps comes from a worry that humans will somehow (perhaps without even meaning to) “cook the books” in order to produce evidence that concords with their own hunches rather than the truth. I don’t think automatically drawn isoglosses can fix this problem, just as the use of statistical p-values doesn’t magically convert all the science it touches into objective truth.
What automatically drawn isoglosses can do, however, is to provide another kind of evidence about the spatial distribution of linguistic variation. Investigators can use the output of this algorithm along with traditional (hand-drawn isogloss) and new (spatial autocorrelation) techniques to advance their understanding of phenomena.
I’d like to end this post with some ideas about future work on improvements to the algorithm.
- This paper by Duckham et al. provides a Delauany-based algorithm for drawing shapes which could be a useful refinement
- One of the references of that paper (#7, by Alani et al.) suggests adding a ring of points around the area of interest to avoid the problem of Voronoi lines wandering off towards the edge of the map. This would replace the current clipping algorithm. It would also help prevent some odd problems with the current approach, such as the isogloss which wanders over western TX, NM, and AZ in the map above (which is preserved by the small-clique elimination pass because it is continuous, outside the viewable region, with the isogloss separating the western US from the rest of the map).
- It would be good to allow isoglosses for multiple features to be drawn on the same map in the interactive interface, and to allow the piecewise deletion of undesired isogloss segments (like the western TX one above)
- It would also be nice to package this code as a ggplot primitive, so
that an isogloss could be added to a ggplot map simply by appending
+ geom_isogloss(...)to a plot creation command. There are some not-quite-working experiments with this in the
ggplot.Rfile of the Github repo.