While AI and OCR recognition have been used for quite some time, it's only been in recent months I've seen an increase in inaccurate hints on Ancestry. It seems to correspond with the widespread increase in the use of AI across the board. While I absolutely appreciate more possible records for my relatives and ancestors, you never know when an "ah ha!" hint will appear, I can't help but notice how many new hints are not relevant. AI considerably reduces the necessity for human intervention and indexing. Combined with the magical algorithms, one would certainly expect many new results as this technology takes hold. But it's far from perfect.
In the past, I'd estimate 95% of the hinting was pretty accurate for my tree (not including user uploaded media potentially appearing more than once). There were always a few scattered incorrect records suggested, but by in large, I was quite happy with results I received. I suspect many of those incorrect hints were the result of the algorithm picking up on records attached in other trees. Now, I'm finding more inaccurate records than ever before, especially in the hints pulled from newspapers and yearbooks. It seems like the combination of AI and algorithms may not be filtering as well as I've been used to. Now, I have a "reject rate" of closer to 15%.
I've found numerous newspaper hints where the person in my tree would have been a youngster, far tooo young to be the subject of a newspaper article. And many appear to be picking up on a spouse (showing a Mrs. [fill in the blank]) when the person being researched is not female (and most often has no tie to the article referenced). I've also noticed the algorithm seems to combine names (example: Mr. Bill Jones went to the home of Mr. Wayne Smith might produce a hint for Bill Smith in my tree). When it comes to yearbooks, the algorithms are often pulling women's married surname instead of their maiden surname (and while it's possible these women might have been married while still in school, it's unlikely.)
I'm a big fan of AI overall and I use it daily as my "assistant", helping me format, translate/transcribe and extract data. I absolutely believe it has a very valid use when it comes to scanning large data sets to provide us with new hints. I just wish they'd tighten up some of the parameters, or perhaps run a second scan of the potential records to weed out some of the inaccuracies. I fear click-happy new researchers (I was one of them back when I started!) will attach anything and everything without proper vetting, causing inaccuracies to populate even faster.
Of course, it is (and always has been ) the responsibility of each researcher to determine if records apply or not. This is not new! Only you can determine if you believe a hint is accurate or not. Over time, I suspect we'll see the inaccurate results decrease as people reject irrelevant hints, and algoritms are tweaked, I believe we'll ultimately see overall improvements in hinting. But for the short term, we may need to be a little more careful evaluating new hints.
One final thought...it's important to note, correlation is not necessarily an indication of causation. I could be totally wrong in my observations. Perhaps it's simply the addition of new newspapers and yearbooks. Perhaps I've been underestimating my previous "reject rate". Have you seen any trends in Ancestry hinting since AI has become more widespread? I'd love to hear your comments and thoughts.
No comments:
Post a Comment
Your comments are appreciated! To reduce spam, all comments are moderated. Your comment will appear after review.