Keith Briggs

This page was last modified 2024-01-21  

.
.


home 
·publications 
·thesis 
·talks 
·meetings 
·records 
·maths notes 
·software 
·languages 
·music 
·travel 
·cv 
·memberships 
·students 
·maps 
·place-names « 
·people 
·photos 
·links 
·ex libris 
·site map 


.

Obsolete

Please see instead the revised 2008 version.

Bayesian analysis of UK place-names

or, where is Ambridge, anyway?

The idea

Ambridge_tiny.png

This is an idea inspired by these initially disconnected thoughts:

  • Drawing maps of the distribution of place-name types is an old game - see e.g. Smith's English place-name elements or Watts' Cambridge dictionary of English place-names (pages li-lxii). I have even automated the process myself with my maps here. But this process is deterministic.
  • By contrast, we all have a feeling about different place-name types in the various regions. For example, Buckfastleigh could only be in Devon, Giggleswick in Yorkshire, and Flegg in Norfolk. But is there any science in this? Could we capture feelings such as these in a mathematical model? If we could, it would have to be a probabilistic model.
  • In the last few years big advances in Bayesian text classification (both in theory and in practice) have been stimulated by the requirements of spam filtering. Can we exploit these advances?
  • Made-up place-names in novels and films always sound wrong, even though they are made up of the right elements. Why is this?

The method

So this is what I have done:

  • I built Bayesian models for each square in the National Grid, using all village and town names in each square (29326 names in total).
  • For each non-existent name below, I computed the posterior probability for it to be in each square according to the model.
  • If the probability was above 1%, I coloured the square with an approximately spectral colour proportional to the probability. Dark blue indicates a very small probability, and bright red the maximum probability, with the levels in between being light blue, green, yellow, and orange, except that in the maps, the largest probability is scaled to 1 to determine the colour. See the key lower down on this page.
  • Thus, the reddest square would be the likeliest place for the non-existent name to be located, if it really existed.
  • In essence, I am doing text classification with 49 categories, rather than the three (spam, possible spam, legitimate email) as is done in spam filtering.

Some initial checks

Just to check that the whole idea is sensible, we'll do some real names first. Note that the model does not know the location of any place-name. But if this idea is at all reasonable, it should get real English place-names in roughly the right places, and it should be able to locate Welsh and Gaelic names. Here are some checks of this type. (Note that there really is a Cambridge in Gloucestershire.)

Cambridge_small.png Buckfastleigh_small.png

Holby_small.png Melbourn_small.png

Newton_small.png Staithes_small.png

Penmaenmawr_small.png Cullenach_small.png

Made-up names from radio, television, films, and comics

Here, the lower the peak probability, the less the name resembles a real name.

Ambridge_small.png Ashfordleigh_small.png

Bigglesthorpe_small.png Fulchester_small.png

Graystone_small.png Holby City_small.png

Royston Vasey_small.png Skelthwaite_small.png

Stambridge_small.png Utterley_small.png

Trumpton_small.png Chigley_small.png

Camberwick Green_small.png Hicksville_small.png

Names made up by Hardy

Alderworth_small.png Anglebury_small.png

Budmouth-Regis_small.png Cam Beak_small.png

Casterbridge_small.png Caxbury_small.png

Charmley_small.png Egdon Heath_small.png

Egdon Bottom_small.png Endelstow_small.png

Kingsbere_small.png Lewgate_small.png

Longpuddle_small.png Mellstock_small.png

Rainbarrow_small.png St. Launce_small.png

Stratleigh_small.png Wadcombe_small.png

Weatherbury_small.png Yalbury_small.png

Names made up by the Brontës

Millcote_small.png Gimmerton_small.png

Gimmerden_small.png Whitcross_small.png

Made-up names from other novels

Queen's Crawley_small.png Crawley with Snailby_small.png

Names made up by me

Botford_small.png Bramthwaite_small.png

Chipton_small.png Doverwood_small.png

Eastwold_small.png Farlingham_small.png

Gritton_small.png Hanton_small.png

Ipsdon_small.png Kettlewick_small.png

Lackbury_small.png Marsley_small.png

Nowborough_small.png Overhythe_small.png

Purbury_small.png Rimsborough_small.png

Sedgeley_small.png Tamridge_small.png

Underston_small.png Walmford_small.png

Technical details

This is the colour coding used in the maps:

colour_key.png

For computing the Bayesian models, I used the dbacl software by Laird Breyer.

The next table gives the number of names per square used to build the Bayesian models, and the easting, northing, latitude and longitude of its bottom left-hand corner.

squarenumbereastingnorthinglatitudelongitude
NA 53 0 900 57.810 -8.739
NB 355 100 900 57.889 -7.063
NC 768 200 900 57.945 -5.380
ND 430 300 900 57.978 -3.691
NF 267 0 800 56.918 -8.577
NG 799 100 800 56.994 -6.941
NH 995 200 800 57.048 -5.298
NJ 677 300 800 57.080 -3.650
NK 113 400 800 57.091 -2.000
NM 820 100 700 56.098 -6.825
NN 967 200 700 56.151 -5.220
NO 887 300 700 56.182 -3.611
NR 621 100 600 55.203 -6.716
NS 956 200 600 55.253 -5.147
NT 966 300 600 55.284 -3.575
NU 218 400 600 55.294 -2.000
NV 556 0 500 54.239 -8.142
NW 710 100 500 54.307 -6.613
NX 639 200 500 54.356 -5.078
NY 723 300 500 54.386 -3.540
NZ 717 400 500 54.395 -2.000
SB 194 100 400 53.411 -6.515
SC 7 200 400 53.458 -5.013
SD 716 300 400 53.487 -3.507
SE 1022 400 400 53.496 -2.000
SH 644 200 300 52.561 -4.951
SJ 1061 300 300 52.588 -3.476
SK 1201 400 300 52.597 -2.000
SM 145 100 200 51.618 -6.335
SN 655 200 200 51.663 -4.892
SO 988 300 200 51.689 -3.447
SP 1170 400 200 51.698 -2.000
SR 26 100 100 50.721 -6.251
SS 474 200 100 50.764 -4.836
ST 983 300 100 50.790 -3.419
SU 1046 400 100 50.799 -2.000
SW 336 100 0 49.824 -6.172
SX 547 200 0 49.866 -4.783
SY 249 300 0 49.891 -3.392
SZ 212 400 0 49.900 -2.000
TA 272 500 400 53.487 -0.493
TF 803 500 300 52.588 -0.524
TG 296 600 300 52.561 0.951
TL 1006 500 200 51.689 -0.553
TM 473 600 200 51.663 0.892
TQ 1236 500 100 50.790 -0.581
TR 306 600 100 50.764 0.836
TV 20 500 0 49.891 -0.608
This website uses no cookies. This page was last modified 2024-01-21 10:57 by Keith Briggs private email address.