|The search was TIMMINS with exact match ticked.|
|Final Excel spreadsheet after a lot of data manipulation|
One thing you will immediately notice is that Outwit Hub has extracted data that is not visible on screen! Cool! Before I move on to analyse the results let’s just see how I extracted the data. The following screen image is OH (Outwit Hub) before the export.
In OH I have moved to Tables under the Data option in the left hand panel. I have filtered by Select Row if Col3 Contains timmins. I have unticked the Clean Text option as we want all the data. On Page Load I have selected Catch Selection and unticked Empty. Columns 2 and 3 contain all the data that is in the final Excel spreadsheet. Next move back to the web page by selecting Page in the left hand panel. Go to the bottom of each page and select Next until you reach the bottom of the data, 2331 in this case. OH will catch all the data. Go back to the Tables page and select Export Excel in the On Page Load panel at the far right. You can load the exported file into Excel and manipulate it as you see fit.
I won’t go into the ways to manipulate the data as it could easily fill another lengthy blog post, and there are hundreds of different ways to do it!. Ideally though you want to get the data in each cell into comma separated format, once you have this copy all the data into Notepad and save as a text file, then open the text file in Excel with the delimited option selected. If you are an Excel guru then there are much more sophisticated ways to extract what you want using Functions and Visual Basic.
Before you start catching data it is a good idea to play around with Outwit Hub to see if more data is available by selecting the other options in the left hand panel. Look at the source option and check through the page using find input field, just type in what you are looking for e.g. Timmins – this shows no results. I tried this on a FMP page and there was additional information that did not show on screen.
Outwit Hub – If it want to try this program there is a free light version.
OutWit Hub breaks down Web pages into their different constituents. Navigating from page to page automatically, it extracts information elements and organizes them into usable collections.
OutWit Hub Light is free and fully operational, but doesn’t include the automation features and limits the extraction to one or few hundred rows, depending on the extractor.
There are lots of on-line tutorials by the makers and users, to get the most out of the program I would recommend you give them a go.
Having all the data in an Excel spreadsheet has enabled statistics heaven!! But it has also highlighted lots of errors in my original investigation using Find My Past, which goes to show that you can’t beat working with the original secondary source material, but even the LDS transcript has its anomolies. For instance West Bromwich has been spelt 10 different ways? The transcribers have been true to the original text but this does not help when you want to filter in or out certain data.
I noted that non of the commercial sites I tried appeared to have suitable filtering available to enable the results I wanted!
The statistics overall, albight more accurate that my first pass on FMP, still tell the same story.
Dudley Parish still appears to be the most likely place of the surname origin
Here is my data relating to the parishes within the Poor Law Union boundaries. It is interesting to note that there were 83 entries without a precise birth place location; 74 of these at least gave a county or country; only 9 entries had no location whatsoever.
|Total Birth Places Identified||2248|
|Blank Birth Place||83|
|(Blanks is where no precise birth place is given)|
|Blank Birth Place||No||%|
|No birth location entered||9||0.40|
|Poor Law Unions||No||%|
– It is a shame that you can no longer download GEDCOM data from the old FamilySearch site.
– Why does the new FamilySearch not have a download facility?
– New FamilySearch needs to have 25, 50, 100, 200 items per page options (a bit like eBay).
– We need to keep the old FamilySearch 1881 Census live as it has many advantages; could they provide a new front end that enables more complex searches.
– Commercial genealogy web sites need a form of “fuzzy search” capability on some of the fields.
– Does anyone know if it is possible to buy the raw 1881 data set from LDS; one that will load into Excel or Access?
– Outwit Hub is great tool for data extraction on the web.
– TIMMINS surname origins to continue in the Dudley Parish.
Finally – If anyone wants a copy of my Excel spreadsheets, either the original data or the final cleaned and edited version, give my your email address and I will gladly send you a copy.
Tony, great post. I had not heard of Outwit Hub. I'm fascinated. Have downloaded the free version and will give it a try. Thanks for your detailed description. I have an outpatient surgical procedure next week that will slow me down for a couple of days. That will be good time to give Outwit Hub a thorough test drive. Thanks.
Thanks Bart, let me know how you get on. I am still at the lower end of the learning curve, struggling to apply Regular Expressions to extract the data in a more useable form. I know Excel quite well so am able to manipulate the data extracted outside Outwit Hub. Cheers Tony
Thanks for this post, and for arranging the discount. I hadn't heard of Outwit Hub either. I've now got my first scraper working, turning a series of pages where the information is in a set of labelled rows into a spreadsheet with the info laid out in columns. Much quicker to do, and no (new) typos to worry about! I like it! Thanks again.
You can download data from new.FamilySearch.org, although I don't know if they have a link on the website. Rather, use a genealogy program that can link to nFS and download it through that. I've used RootsMagic (which has a free version) and GetMyAncestors, which is free but requires registration. RootsMagic is a full genealogy program while GetMyAncestors will just download your data from nFS and gives you a GEDCOM file of it.
Hope that helps!