Using Excel to Format FamilySearch Data

It has been quite some time since I last posted a blog, this is mainly due to my TIMMINS One Name Study taking up more time than expected.  I am still learning about surname studies so have been reading up on the subject.  I have just finished the Surname Detective by Colin D. Rogers, this book has proved to be a very useful introduction as well interesting, I can recommend it if you are in any way interested in surnames.  Next on my reading list is a book referred to many times by Rogers – The Origin of English Surnames by P.H Reaney.

So what else have I been doing over the past few weeks.  One thing that cropped up was a requirement to investigate a surname in my wife’s family, this was subsequent to the discovery of a photograph that had a list of names on it.  The family name was WARNER, they had resided in India in the 19th and early 20th century’s.

FamilySearch has pretty good coverage of India, so some family reconstruction could be carried out to determine the family groups.  Searching on India Marriages for WARNER  produced some 253 matches, that is 13 pages of links at 20 links per page.  Each marriage record has 24 items so copy and pasting all this into an Excel spreadsheet could take a long time; but……

by using my favourite data capture program Outwit Hub I devised a really simple scraper and saved myself hours.

The methodology of using this scraper is the same as detailed in my previous post Extracting Marriage Data Made Easy

Once I had Caught the data in the Catch area it was exported into Excel, I then made a copy of the worksheet (this is so that I can work on the data but retain the original data – just in case!).  There are a number of colums of data that I don’t need so all those are deleted, that just leaves the field name in column A and the data in column B.

I was now faced with a vertically tablulated column of data stretching over 6,072 rows (253 x 24).  What I really need is 24 columns of data over 253 rows.  I have used Excel for many years but my expertise in Excel functions would not enable me to sort this one out!  I did know however that a macro in VBA would be my best bet, so I searched the usual forums and found a solution.

To make this macro work I needed an end of record identifier for each of the 253 records.  The last field on each of the record sets was “Reference Number”, this field had no useful data in it – so I filtered column A on this field and filled all 253 instances with an “@” symbol, this this is the end of record delimiter for the macro.  Column A is now no use so it is deleted.  All the useful data should now be in Column A (unfiltered).  Run the macro and you now have the data in a usable format.  Insert a row at the top and name the columns a required.  Rather than have a load of screen shots of Excel showing the process you can download the Excel Spreadsheet from my Google Documents HERE (under File – Download).

There are 4 tabs in the workbook with the instructions on how to use it in the first tab.  If you want to see the code behind the macro then go to Tools – Macro – Visual Basic Editor – if it is not already visible then double click Module 1.

Well there it is, with this macro you should be able to tackle any vertically tabulated column of data and manipulate it into a useable database.

Before I sign off thanks go to Jerry Beaucaire on the Excel Forum for the neat peice of code.  Jerry also has his own Excel Assistant web site where you can leave a donation if you found this code useful.

10 Ways to Write West Bromwich!

In my last post on the TIMMINS surname I was left with an action to check if I could glean more information on birth locations from the FamilySearch web site, namely the 1881 census of England and Wales.  There are two possibilities for FamilySearch, either I use the old site which displays 200 entries at a time and gives a total hit of 2331 entries of the surname, or I use the new site which gives 2337 hits but only displays 20 entries at a time!

I decided to use the old site as it prevented RSI by requiring fewer mouse actions.  I used my old friend “Outwit Hub” to extract the data; this is now available as a standalone programme so no more dependency on the Firefox browser.  See below for 10% discount offer.The following two images show the data as presented firstly by FamilySearch into the Outwit Hub add-on for Firefox (this is prior to using the extraction options);  and, secondly the extracted data, exported in Excel format, then manipulated in an Excel spreadsheet.Outwit 1                               The search was for TIMMINS with exact match tickedOutwit 2                        Final Excel spreadsheet after a lot of data manipulation

For a better view, if you click on the images they should open in a larger window

One thing you will immediately notice is that Outwit Hub has extracted data that is not visible on screen! An excellent bonus.  Before I move on to analyse the results let’s just see how I extracted the data. The following screen image is OH (Outwit Hub) before the export.Outwit 3In OH I have moved to Tables under the Data option in the left hand panel.  I have filtered by Select Row if Col3 Contains timmins.  I have unticked the Clean Text option as we want all the data.  On Page Load I have selected Catch Selection and unticked Empty.  Columns 2 and 3 contain all the data that is in the final Excel spreadsheet.  Next move back to the web page by selecting Page in the left hand panel.  Go to the bottom of each page and select Next until you reach the bottom of the data, 2331 in this case.  OH will catch all the data.  Go back to the Tables page and select Export Excel in the On Page Load panel at the far right.  You can load the exported file into Excel and manipulate it as you see fit.

I won’t go into the ways to manipulate the data as it could easily fill another lengthy blog post, and there are hundreds of different ways to do it!.  Ideally though you want to get the data in each cell into comma separated format, once you have this copy all the data into Notepad and save as a text file, then open the text file in Excel with the delimited option selected.   If you are an Excel guru then there are much more sophisticated ways to extract what you want using Functions and Visual Basic.

Before you start catching data it is a good idea to play around with Outwit Hub to see if more data is available by selecting other options in the left hand panel.  Look at the source option and check through the page using “Find:” input field, just type in what you are looking for e.g. Timmins – this shows no results.  I have previously tried this on a Find My Past page and discovered that there was additional information that did not show up on the screen.

Outwit Hub – If it want to try this program there is a free light version.

OutWit Hub breaks down Web pages into their different constituents. Navigating from page to page automatically, it extracts information elements and organizes them into usable collections.

 OutWit Hub Light is free and fully operational, but doesn’t include the automation features and limits the extraction to one or few hundred rows, depending on the extractor.

There are lots of on-line tutorials by the makers and users, some new tutorials now cover the more advanced features like Macros.

Now back to my TIMMINS surname investigations.

Having all the data in an Excel spreadsheet has enabled statistics heaven!!  But it has also highlighted lots of errors in my original investigation using Find My Past, which goes to show that you can’t beat working with the original secondary source material, but even the LDS transcript has its anomolies.  For instance West Bromwich has been spelt 10 different ways?  The transcribers have been true to the original text but this does not help when you want to filter in/out certain data.

I noted that none of the commercial sites I tried appeared to have suitable filtering available to enable the results I wanted!

The statistics overall, albeit more accurate than my first pass on FMP, still tell the same story.

Dudley Parish still appears to be the most likely place of the surname origin

The 1881 census gives three fields for birth location –  Birth Place i.e. Parish; Birth County and Birth Country.   It is interesting to note that there were 83 census entries without a precise Birth Place; 74 of these did at least gave a county or country; with only 9 census entries having no birth location whatsoever.Here is the data relating to the number of Timmins in each of the parishes within the three Poor Law Union boundaries of Dudley, Stourbridge and West Bromwich.  See previous blog post for  map of boundaries, also lists of parishes and townships in the PLU’s.  (Note – Poor Law Union boundary is the same as the Registration District).  The % figure is the percentage of total Timmins’ in  England and Wales.

data-1

data-2

Conclusions:

There are 10 ways to spell West Bromwich!!

Despite the errors associated with birth place spellings, and missing birth place parishes on some census returns, the TIMMINS surname still appears to have its origins in the Dudley Parish.

Coming Next:
I am currently in the process of data extraction and formulating how to store it.
a) Getting the Timmins birth marriage and death info from FreeBMD, from 1837 to end 1841.
b) Extracting all the Timmins’ from the 1841 census returns.
c) Finding all the Timmins (and possible variants) from the Dudley parish records, starting with IGI data.
d) Trial various programmes for suitability of storing and reporting on my One Name Study data.