The distribution of distance to Roman roads in England
NB: this material is obsolete. The final version of it appeared in my paper in Nomina 32 (2009) 43.
The distribution of distance to Roman roads in England
Introduction
We sometimes read claims in articles about placenames that certain placename types occur more frequently near to Roman roads. For example:
 B. Cox, The significance of the distribution of English placenames in hám in the Midlands and East Anglia JEPNS 5 1573 (19721973)
 M. Gelling: Wickham names, in Signposts to the past.
 A. E. B. Owen: Roads and Romans in SouthEast Lindsey: the PlaceName Evidence, in Names Places and People, ed. A R Rumble & A D Mills 1997 p 254.
 A. Cole: The use of Netel in placenames JEPNS 35 4958 (2003) (has table of distances)
 A. Cole: The use of ON nata in placenames JEPNS 36 5153 (2004)
It would seem necessary to test these hypotheses statistically. Such a study was done for the name Coldharbour by Trevor Ogden in Coldharbour and Roman Roads, Durham University Journal 59, 1324 (1966). A positive result was found  Coldharbours are closer to Roman roads than randomly distributed points. However, the author has since revised his opinion, because it has become clear that the control group should not be randomly distributed points. It might be that all settlements are closer to Roman roads than randomly distributed points. Furthermore, with respect to Coldharbour names, Richard Coates has convincingly demonstrated in Nomina 8, 7378 (1984) that the name has a mostly post1600 origin, and no connection to Roman roads is at all likely.
I have thus started a reexamination of these ideas. With modern computing methods, much more precise statistical information should be obtainable.
 I obtained road data for England from English Heritage (thanks to Lindsay Jones for this). This data includes many minor roads, especially the Viatores ones from the SE midlands.
 The data is plotted above right; you may also download a zoomable pdf file of the same data.
 The road data is plotted on an elevation background which I derived from the SRTM (Shuttle Radar Topography Mission) 3 arcsecond data from here. The colour coding is my own.
 For uniformly distributed points (with respect to the National Grid) in mainland England I computed the distance to the nearest Roman road. I plotted the cumulative distribution of these distances, obtaining the first graph below.
 A better null hypothesis would use points nonuniformly distributed according to some measure of settlement density. As a first try, I took the 11677 names in the geonames GB database which are tagged PPL (populated place). I computed the distance of each to the nearest Roman road, obtaining the distribution given by the green curve on the second graph below.


We conclude that the median distance is about 3.5km. In other words, half of all points in England are within 3.5km of a Roman road, and so being 3.5km or less from a Roman road should not be considered unusual. Of course, having more than one point close to a road is always less likely, if they are independently distributed. I am in the process of computing more detailed data for Mills' ham names.
Incidentally, these calculations involve some interesting computational geometry  good, fast algorithms are required for the pointinpolygon test, and for distance to the nearest of a given finite set of straight line segments.
England  distance map