Keith Briggs

This page was last modified 2024-01-21  

.
.


home 
·publications 
·thesis 
·talks 
·meetings 
·records 
·maths notes 
·software 
·languages 
·music 
·travel 
·cv 
·memberships 
·students 
·maps 
·place-names « 
·people 
·photos 
·links 
·ex libris 
·site map 


.

search_gui

Bradley_sawyer_ss_500_crop.png

This is a collection of software for searching for arbitrary text patterns in three databases:

  1. The Old English charter corpus (sawyer.py; database included).
  2. The complete Old English literary and charter corpus (COE.py; free database available)
  3. The Ordnance Survey 50k gazetteer (OS_50k.py; requires purchase of gazetteer).

Such searching is useful in linguistics and place-name research, especially when searching in old documents where spellings may vary. See also my Anglo-Saxon charters with Old English bounds - pdf edition, which has a complete index with clickable links.

NB: It is assumed that the user has proper reference books to hand to verify the information. The output of this program should never be considered a reliable source; the program is intended only as a help to speed up searches.

This software has a graphical user interface (GUI). Normally on linux this is inferior to a good command-line tool, but I have had requests from Windows users for a GUI program, so here it is. I nevertheless recommend defenestrating your computer and installing linux as soon as possible. The software has been developed on linux but also works on Microsoft Windows, and probably on Mac too.

Installation

Do steps 1 and 2 and one or more of steps 3a, 3a, 3c.

  1. Install python. If you use linux or Mac OS X you almost certainly have it already. For Microsoft Windows get the most recent Windows installer (2.7.x, not python 3.0) from here and run it. Python is free software and very easy to install.
  2. Make a folder search_gui to put my software in. All "save" instructions below mean to save in this folder. Save tkinter_app_00.py.
    • a. To search the Old English charter corpus: save sawyer.py, and sawyer.pkl.
    • b. To search the complete Old English literary and charter corpus: (i) save COE.py; (ii) either get the free (but old) corpus from OTA or purchase the latest version from DOE, and unpack the zipfile in search_gui.
    • c. To search the Ordnance Survey 50k gazetteer: (i) save OS_50k.py; (ii) purchase (some EPNL members already have it) the 50k gazetteer from OS and save a copy of the gazetteer in search_gui. Make sure this file name is correctly specified at the top of OS_50k.py - open it with the python IDLE program, or any text editor, and if necessary edit this value.

Running the software

The programs will be slow to start on the first run, as they convert their databases to a form allowing quick searching, but will be fast on subsequent runs.

  • On linux: run the desired program from the command line.
  • On Windows: run the desired program by double-clicking its python icon in the file manager.

Regular expressions

Because this software is intended for more general and powerful searching than is normally available, some understanding of regular expressions is needed. This is a standard topic in computer science, and essentially allows the specification of an arbitrary text pattern. The full rules used in this software are here. However, for most purposes this summary should suffice:

  • ordinary word characters such as a..z, A..Z, 0..9 match themselves.
  • . matches any single character
  • [...] matches any one of the enclosed characters
  • ? means the preceding element is optional (0 or 1 occurrences)
  • * means zero or more of the preceding element
  • + means one or more of the preceding element
  • \w means any ordinary word character
  • \s means any space character
  • ^ means start-of-word
  • $ means end-of-word

Examples for sawyer.py - OE charter search

Enter þ as {th}, Þ as {TH}, ð as {dh}, Ð as {DH}, æ as {ae}.

  • s[ey]o?[fu][aeiou]n: finds different spellings of "seven".
  • bradan? le: matches references to names ancestral to modern Bradley.
  • Seven_sawyer_ss_500.png Bradley_sawyer_ss_500.png


    Examples for OS_50k.py - OS gazetteer search

    1. Bradley: matches names starting with "Bradley"
    2. ^C[ao]ld?e?cot[te]?: start with C, followed by either a or o, followed by l, followed by optional d, followed by cot, optionally followed by either t or e ([te]?). Typical matches: Caldecott, Caldecote, Calcot, Calcott, Colcott.
    3. ^Ch?[aei]+st\wr means: start-of-word, C, optional h, one of a or e or i, st, any word character, r. Typical matches: Castor, Chesters, Chesterford Park, Chesterhill Ho, Chesterblade, Chesterbank, Casterton Fell, Chesters Burn, etc.
    Bradley_OS_ss_500.png Calcot_OS_ss_500.png

    Examples for COE.py - Corpus of Old English search

    Enter þ as {th}, Þ as {TH}, ð as {dh}, Ð as {DH}, æ as {ae}.

    • l[iyue]tlan?\s+dic: "little dykes"
    • tur\s+on\s means: tur, one or more spaces, on, space

    COE_ss_500.png


    This website uses no cookies. This page was last modified 2024-01-21 10:57 by Keith Briggs private email address.