Using Excel to Format FamilySearch Data

It has been quite some time since I last posted a blog, this is mainly due to my TIMMINS One Name Study taking up more time than expected.  I am still learning about surname studies so have been reading up on the subject.  I have just finished the Surname Detective by Colin D. Rogers, this book has proved to be a very useful introduction as well interesting, I can recommend it if you are in any way interested in surnames.  Next on my reading list is a book referred to many times by Rogers – The Origin of English Surnames by P.H Reaney.

So what else have I been doing over the past few weeks.  One thing that cropped up was a requirement to investigate a surname in my wife’s family, this was subsequent to the discovery of a photograph that had a list of names on it.  The family name was WARNER, they had resided in India in the 19th and early 20th century’s.

FamilySearch has pretty good coverage of India, so some family reconstruction could be carried out to determine the family groups.  Searching on India Marriages for WARNER  produced some 253 matches, that is 13 pages of links at 20 links per page.  Each marriage record has 24 items so copy and pasting all this into an Excel spreadsheet could take a long time; but……

by using my favourite data capture program Outwit Hub I devised a really simple scraper and saved myself hours.

The methodology of using this scraper is the same as detailed in my previous post Extracting Marriage Data Made Easy

Once I had Caught the data in the Catch area it was exported into Excel, I then made a copy of the worksheet (this is so that I can work on the data but retain the original data – just in case!).  There are a number of colums of data that I don’t need so all those are deleted, that just leaves the field name in column A and the data in column B.

I was now faced with a vertically tablulated column of data stretching over 6,072 rows (253 x 24).  What I really need is 24 columns of data over 253 rows.  I have used Excel for many years but my expertise in Excel functions would not enable me to sort this one out!  I did know however that a macro in VBA would be my best bet, so I searched the usual forums and found a solution.

To make this macro work I needed an end of record identifier for each of the 253 records.  The last field on each of the record sets was “Reference Number”, this field had no useful data in it – so I filtered column A on this field and filled all 253 instances with an “@” symbol, this this is the end of record delimiter for the macro.  Column A is now no use so it is deleted.  All the useful data should now be in Column A (unfiltered).  Run the macro and you now have the data in a usable format.  Insert a row at the top and name the columns a required.  Rather than have a load of screen shots of Excel showing the process you can download the Excel Spreadsheet from my Google Documents HERE (under File – Download).

There are 4 tabs in the workbook with the instructions on how to use it in the first tab.  If you want to see the code behind the macro then go to Tools – Macro – Visual Basic Editor – if it is not already visible then double click Module 1.

Well there it is, with this macro you should be able to tackle any vertically tabulated column of data and manipulate it into a useable database.

Before I sign off thanks go to Jerry Beaucaire on the Excel Forum for the neat peice of code.  Jerry also has his own Excel Assistant web site where you can leave a donation if you found this code useful.

FamilySearch for TIMMINS in 1881 using Outwit Hub

In my last post on the TIMMINS surname I was left with an action to check if I could glean more information on birth locations from the FamilySearch web site, namely the 1881 census of England and Wales.  There are two possibilities for FamilySearch, either I use the old site which displays 200 entries at a time and gives a total hit of 2331 entries of the surname, or I use the new site which gives 2337 hits but only displays 20 entries at a time!I decided to use the old site as it prevented RSI by requiring fewer mouse actions.  The following two images show the data as presented firstly by FamilySearch into the Outwit Hub add-on for Firefox (this is prior to using the extraction options);  and, secondly the extracted data, exported in Excel format, then manipulated in an Excel spreadsheet.
The search was TIMMINS with exact match ticked.
Final Excel spreadsheet after a lot of data manipulation
For a better view, if you click on the images they should open in a larger window

One thing you will immediately notice is that Outwit Hub has extracted data that is not visible on screen! Cool!  Before I move on to analyse the results let’s just see how I extracted the data. The following screen image is OH (Outwit Hub) before the export.

In OH I have moved to Tables under the Data option in the left hand panel.  I have filtered by Select Row if Col3 Contains timmins.  I have unticked the Clean Text option as we want all the data.  On Page Load I have selected Catch Selection and unticked Empty.  Columns 2 and 3 contain all the data that is in the final Excel spreadsheet.  Next move back to the web page by selecting Page in the left hand panel.  Go to the bottom of each page and select Next until you reach the bottom of the data, 2331 in this case.  OH will catch all the data.  Go back to the Tables page and select Export Excel in the On Page Load panel at the far right.  You can load the exported file into Excel and manipulate it as you see fit.

I won’t go into the ways to manipulate the data as it could easily fill another lengthy blog post, and there are hundreds of different ways to do it!.  Ideally though you want to get the data in each cell into comma separated format, once you have this copy all the data into Notepad and save as a text file, then open the text file in Excel with the delimited option selected.   If you are an Excel guru then there are much more sophisticated ways to extract what you want using Functions and Visual Basic.

Before you start catching data it is a good idea to play around with Outwit Hub to see if more data is available by selecting the other options in the left hand panel.  Look at the source option and check through the page using find input field, just type in what you are looking for e.g. Timmins – this shows no results.  I tried this on a FMP page and there was additional information that did not show on screen.

Outwit Hub – If it want to try this program there is a free light version.

OutWit Hub breaks down Web pages into their different constituents. Navigating from page to page automatically, it extracts information elements and organizes them into usable collections.

 OutWit Hub Light is free and fully operational, but doesn’t include the automation features and limits the extraction to one or few hundred rows, depending on the extractor.

There are lots of on-line tutorials by the makers and users, to get the most out of the program I would recommend you give them a go.

Now back to my TIMMINS surname investigations.

Having all the data in an Excel spreadsheet has enabled statistics heaven!!  But it has also highlighted lots of errors in my original investigation using Find My Past, which goes to show that you can’t beat working with the original secondary source material, but even the LDS transcript has its anomolies.  For instance West Bromwich has been spelt 10 different ways?  The transcribers have been true to the original text but this does not help when you want to filter in or out certain data.

I noted that non of the commercial sites I tried appeared to have suitable filtering available to enable the results I wanted!

The statistics overall, albight more accurate that my first pass on FMP, still tell the same story.

Dudley Parish still appears to be the most likely place of the surname origin

Here is my data relating to the parishes within the Poor Law Union boundaries.  It is interesting to note that there were 83 entries without a precise birth place location; 74 of these at least gave a county or country; only 9 entries had no location whatsoever.

Birth Place            No   %
Dudley 269 11.97
Sedgley 157 6.98
West Bromwich 138 6.14
Tipton 113 5.03
Stourbridge 71 3.16
W Bromwich 59 2.62
Wednesbury 47 2.09
Oldbury 40 1.78
Halesowen 30 1.33
Kingswinford 29 1.29
Rowley Regis 26 1.16
Brierley Hill 17 0.76
Westbromwich 10 0.44
West Bromch 7 0.31
Brierly Hill 6 0.27
W.B. 6 0.27
Cradley 4 0.18
Oldswinford 3 0.13
W Brom 3 0.13
West Bromh 3 0.13
Lye 2 0.09
Amblecote 1 0.04
Dudley Port 1 0.04
Dudly Port 1 0.04
Quarry Bank 1 0.04
West Brom… 1 0.04
West Broml… 1 0.04
Westbromwichh 1 0.04
sub-total 1047 46.57
Total Birth Places Identified 2248
Total Surnames 2331
Blank Birth Place 83
(Blanks is where no precise birth place is given)
Blank Birth Place  No  %
Cheshire 4 0.18
Cornwall 2 0.09
Cumberland 2 0.09
Shropshire 4 0.18
Other England 3 0.13
Ireland 48 2.14
Scotland 7 0.31
United States 3 0.13
Malta 1 0.04
No birth location entered 9 0.40
Total 83 3.69
Poor Law Unions  No  %
Stourbridge 164 7.30
Dudley 567 25.22
West Bromwich 316 14.06
Conclusions & Observations so far:

– It is a shame that you can no longer download GEDCOM data from the old FamilySearch site.
– Why does the new FamilySearch not have a download facility?
– New FamilySearch needs to have 25, 50, 100, 200 items per page options (a bit like eBay).
– We need to keep the old FamilySearch 1881 Census live as it has many advantages; could they provide a new front end that enables more complex searches.
– Commercial genealogy web sites need a form of “fuzzy search” capability on some of the fields.
– Does anyone know if it is possible to buy the raw 1881 data set from LDS; one that will load into Excel or Access?
– Outwit Hub is great tool for data extraction on the web.
– TIMMINS surname origins to continue in the Dudley Parish.

Finally – If anyone wants a copy of my Excel spreadsheets, either the original data or the final cleaned and edited version, give my your email address and I will gladly send you a copy.