|
The Gentleman's Magazine
Thoughts about Indexes and
Searching
These thoughts about indexes and searching were prompted
by using the Gentleman's Magazine, published in London 1731
to 1907, or 1922.
source type: Gents Mag
Using indexes and using search tools are quite different
ways are exploring a text source like the Gentleman's
Magazine. Searching has recently become popular with the use
of websites. What is offered by some websites, but not
explained, is a search tool that apparently works on .pdf
page images using Optical Character Recognition (OCR)
software somewhere in the process. Other tools are based on
more reliable machine readable text, but there are still
difficulties. A major failing of searching is that the text
being searched is old, it has odd letter forms, odd
spellings, odd hyphenations, and so on. Looking for
Windermere you will not find Wynandermere, Winder Meer, and
other spellings, or even the current spelling if it is split
over a line or a page. It is not reasonable to expect the
searcher to know all possible variants, some of which might
be one-off errors, which have to be searched for. It gets
worse if the target of the search has changed its name at
one time or another: Thirlmere was known as Leathes Water
and Wythburn Water amongst other things as well as having
variant spellings. If the target of the search is not narrow
but a wide interest then things get completely impossible:
would you know to look for William Gibson when looking for
people of interest to The Lakes? The user can't be expected
to know what to look for. Powerful and apparently successful
search engines like Google leave you in awe at their power:
BUT you don't know what they have missed.
Using an index is a different process. The user does the
searching by scanning down a list of what is available
presented in a useable order. This, too, has failings. Did
the indexer include keys to every thing in which you have an
interest, and did he key oddly spellt variants under both
the source spelling and a standardised modern form? Good
indexing depends on common sense, appreciation of the
source, and of people's interests, supported by strong
terminology rules, awareness of the value of variant forms,
and so on. Its an art: its never perfect, but it should link
you to the target of your search through all sorts of
spellings and historical versions of names of people,
places, etc. In the arrangement of the index keys a name
like William Gibson will be presented in a list that makes
his relevance to The Lakes apparent. Beware that although
book indexing is often done very well, indexing to books in
a collection, especially, for example, a public library, is
usually extemely poor.
|
|
What To Index and How
Indexing the transcribed pages, stuff relevant to
Cumbria, in a reasonably thorough way is not a small task.
Using a purist approach is laborious: one example should
demonstrate the work needed, the Content group for the
Gentleman's Magazine 1745 p.604, record , G7450604.txt,
could be (I have added an author for the purpose of
demonstration):-
CONTENT
PERSON author: Smith, George & GS
PERSON soldier: Wade, George, Marshall
PERSON : Charles, Prince
PERSON soldier: Perth, Duke of
PERSON soldier: Ogilvy, Lord
PERSON soldier: Gordon, Lord
PERSON soldier: Pattenson, Thomas
PERSON unit: Murray's Regiment
PLACE Great Corby & Wetheral & Cumbria
Cumberland) & England
PLACE Carlisle & Cumbria (Cumberland) &
England
PLACE Penrith & Cumbria (Cumberland) &
England
PLACE Warwick Bridge & Wetheral & Cumbria
(Cumberland) & England
PLACE Stanwix Bank & Carlisle & Cumbria
(Cumberland) & England
PLACE Brampton & Cumbria (Cumberland) &
England
PLACE Rickerby & Stanwix Rural & Cumbria
(Cumberland) & England
PLACE Warwick & Wetheral & Cumbria (Cumberland)
& England
PLACE Blackhall & St Cuthbert Without & Cumbria
(Cumberland) & England
PLACE Rockcliffe & Cumbria (Cumberland) &
England
DATE 1745
PERIOD 18th century, early & 1740s
EVENT rebellion: 1745 Rebellion
EVENT siege: siege, Carlisle
OBJECT_NAME magazine & Gentleman's Magazine
OBJECT_NAME Rowcliff (Rockcliffe) & Rickarby
(Rickerby)
The time taken to record all this is not small.
Thinking about what is wanted from the indexing makes a
less purist approach attractive.
Not all the keywords in the above analysis are wanted,
'Cumbria' and 'England' for examples.
The planned indexes for many of the keywords do not
require them to be in separately identifiable concepts; they
will just be entries in a general index.
But I do want to be able to index, in a controlled
manner, the magazine date and author.
So a simplified approach is:-
CONTENT
PERSON author: Smith, George & GS
DATE 1745
PERIOD 18th century, early & 1740s
0BJECT_NAME magazine & Gentleman's Magazine
TEXT_SECTION
KEYWORD Wade, George, Marshall & Charles, Bonnie
Prince & Perth, Duke of & Ogilvy, Lord & Gordon,
Lord & Pattenson, Thomas & Murray's Regiment &
Great Corby, Wetheral & Carlisle & Penrith &
Warwick Bridge, Wetheral & Stanwix Bank, Carlisle &
Brampton & Rickarby (Rickerby) & Rickerby, Stanwix
Rural & Warwick, Wetheral & Blackhall, St Cuthbert
Without & Rowcliff (Rockcliffe) & Rockcliffe &
rebellion, 1745 & 1745 Rebellion & Carlisle, siege
& siege, Carlisle
This pattern is similar to the indexing approach already
used, successfully, for guide book transcriptions.
Note that I am indexing for Cumbria interest; and I am
using terms to match the Old Cumbria Gazetteer.
Article or Page?
Indexing in other elements of the Lakes project
transcriptions has always been done record, ie page, at a
time. The idea of indexing by article rarther than page for
the Gents Mag was considered and rejected.
|