Saturday, January 23, 2016

Learning: The Trouble with Indexes

I am a huge proponent of indexing, especially as a volunteer. I believe that the volume of records left to be made available to the public is so enormous, that volunteer indexing is the only way many records will ever see the light of day. There simply aren't enough resources--in money or time--for any one organization to pay for what records exist to be indexed.

So it falls to every genealogist, the users of these records, to seek volunteer opportunities to index records wherever possible. Everyone who cares about these records, and the people listed on them, must become a well-trained custodian of the past. How we index records is a part of that stewardship.

So when we consider what makes up a good index of a record set, I think we can agree on some core elements.

Accuracy


The index that is created must be an accurate representation of what is written on the records. Some errors will always present themselves due to transcription, and these are especially understandable with handwritten records. But an index greatly loses value when all names and details that appear on the records are not included, or the index misrepresents any data points.

Searchable


The index should be searchable by all relevant data points. And I'm consistently in the camp that says that if it appears on the record, it's relevant information. While it may not be practical to make every record searchable by every data point, an index's search functionality should be as inclusive as possible to every name that appears on the record.

One significant example of a commonly missing search parameter is race. At the time of this writing, the only website of the Big Four (Ancestry.com, My Heritage, FindMyPast, and FamilySearch) that allows you to search universally by race is Ancestry.com. This creates a significant impediment to doing research for anyone of African descent.

The problem becomes compounded increasingly when the black individuals for which I'm searching are not African American, and have no connection to the United States. But the example I'll be addressing below applies specifically to the challenges of indexing African American records.

Availability of Original Images


Not every index can provide images for original records. To do so is often cost prohibitive, especially for smaller organizations that are new to indexing.

However, every index should communicate the origins for the information being indexed. Date ranges, specific locations, repository, and all other information necessary to obtain copies of the original images should be present in the index description and item descriptions.

Not only does this aid in the crafting of quality source citations, but makes it possible for interested parties to request original copies.

I think we can all agree that the organization that makes it easiest for genealogists to index records on a large scale is FamilySearch. Their interface has provided the standard for any organization wanting to engage their communities in indexing efforts. What they deliver, especially since it relies almost entirely on volunteers, is impressive.

But sometimes even they can miss the mark.

Pittsylvania County, Virignia Death Index on FamilySearch: A Bad Example


What happens when an index is created, but many of the above points are ignored? How does it affect a person's ability to find desired records?

After the Ben Affleck/Finding Your Roots controversy hit the fan, it really got me thinking about what the best approach is to document slavery. And while I think everyone has a right of privacy to share or withhold whatever they want about their family to an international audience, I think slavery demands more of genealogists and family historians. Because I have slave holders and slaves throughout various lines of my family, I've decided to document both groups with total openness and objectivity.

Pittsylvania County, Virginia has a treasure trove of records in comparison to other communities in the South, including for African Americans. I decided to start there, which led me to Virginia Deaths and Burials, 1853-1912 on FamilySearch.

When I first sat down to do this project in September of last year, there was no way to search this collection by race. As of this writing, that has since changed. Because I was specifically interested in slaves held by the Keatts family in Pittsylvania County, my search parameters included the years 1853-1865.

The following are examples of records I found.




One issue I noticed, having seen both the originals and another index of these records, is the placement of the slave holder's name. As you can see here, it has been placed in the father's name position. While I cannot comment on the paternity of any of the slaves owned by my family, I can vouch for the fact that the original records made no such attempt. On the original records as provided by the other index, Richard Keatts is labelled in a column specifically designated for the owner of deceased slaves. He is also listed as the Consort, or informant.





The practice of putting the slave owner's name in the father's name field is consistent throughout the collection, regardless of the gender or relationship of the owner. Aletha "Letty" Keatts is female, the sister of Richard Keatts, and her name also appears in the father's field.

Upon closer inspection of the FamilySearch collection, giving the owner's name, followed by "(Owner)," and providing that information as the father's name appears to be the convention for all records related to slaves in Pittsylvania County. In order to have that degree of compliance with this many records, this has to be what the indexers of the collection were instructed to do. However, without the original images, I cannot say whether every slave holder in the FamilySearch collection has such a designation. Additionally, that convention is not disclosed in any description of the collection, or in the Known Issues page of the collection that I could find.

Additionally, the emancipation status of every African American was stated plainly on the original records. Whether the deceased was white was answered with a Yes or No. In a second column, labeled "Colored," their emancipation status was listed as either "Slave" or "Free." However, that information was not indexed in the FamilySearch collection.

While it may be possible to isolate all of the enslaved African Americans by using the race search box and searching for "Owner" in the father's name field, there are some issues with this approach. The first is that I cannot determine if every slave holder has such a designation. The second issue is that every search result with that designation appears twice--once as a result for the deceased individual, and once for the so-called father/owner.

As a result, anyone trying to find enslaved ancestors in Pittsylvania County for this time period has to comb through a results list full of duplicates. Anyone trying to find emancipated ancestors for the same time period is unable to isolate these results from everything else. Given that this information is so clearly stated on the records, the real issue here is the way the records were indexed. The indexing program simply didn't have fields to index the names of slave holders.

But if real efforts are going to be made to index records pertaining to African Americans and their ancestors, these improvements to the indexing program need to be made. And the fastest, simplest way to correct all of the issues related to these records is to re-index the collection.

While it is possible to rely on user submitted corrections to individual records, which FamilySearch stated to me as their proposed solution, the lack of thorough correction to all affected records shows a real lack of accountability for the situation they've created. Some might also say that an inaccurate or "quirky" index is better than no index at all. But for errors that so disproportionately affect the African American community, such a glib response is unconscionable. It leaves us to wonder how many other collections related to slavery have similarly botched indexes, and what FamilySearch is doing to identify and correct these issues.

A Lesson Learned


The most important lesson to take from this example is that indexing efforts should be well-planned if we expect them to be well-executed. Taking shortcuts, or trying to avoid proper adaptation of current resources to created a true derivative, ultimately creates more work than it alleviates.

Because of the way computer databases are constructed on the back end, the only chance there is to address multiple records at once is when the collection is indexed. After that, it is a one-by-one, tedious effort to do any corrections.

Indexes can be incredibly useful. They serve a necessary, low-cost function in providing free access to records. But unfortunately, there is a dark and messy underbelly to them of which every genealogist should be aware before using them.



The issues that come with them sometimes make them about as useful as another brick in the wall.

Thursday, October 8, 2015

AncestryDNA: A Year Later

As some of you may recall, I did a post about my initial experience with the AncestryDNA test. That post is more than a year old now, and AncestryDNA has undergone two major changes since then. There are new features to consider, and how they have fundamentally changed my experience with their DNA test.

Like last time, I wanted to give myself ample opportunity to use these new tools before doing a follow-up review. And unlike last time, I have something else to which I can compare my experience. Not only have I been using GEDmatch.com, I’ve also uploaded and unlocked my free trial matches at Family Tree DNA. While my experience with these sites have informed my perspective, I will try to save my comments on each of these sites for their own respective posts.

I won’t be reviewing the Ethnicity Estimates again, because my opinion of them has not changed.

Cousin Matching: C-


My experience with cousin matching has improved significantly. The first impact my DNA test has had on my tree came from using the tools at AncestryDNA. I began the process using the surname search, which is one of the best tools on AncestryDNA. It allows me to search through my cousin matches’ trees for a surname, a location, or both at the same time.


An example of the surname search, using the surname Halsey


I reached out to one of my cousins, then decided to compensate for her lack of response by researching her family tree for her. Thanks to what little information she provided on her parents, I was able to use obituaries and newspapers to trace her family until I arrived at our common ancestors. I never knew where they went after the 1920 census, and the answer was with her line of the family. They moved to Somerset County, Maryland. Her ancestor was the youngest sibling in a family I’d never realized had more children—the only ones still living with them ten years later at the time the 1930 census was taken.

The names and new census records were added to my tree—and my cousin is none the wiser. Which is probably for the best, because I don’t know how to explain to her what I did without using the words, “Don’t freak out, but I stalked you a little bit.”

A general lack of communication is still one of the predominate issues with DNA testing. This was my chief complaint in my previous review, and over time I've come to understand that this isn't a problem unique to AncestryDNA. With every DNA testing service to which I've been exposed, responses to inquiries are rare and wait times are long. It's the human element of the equation that no DNA testing company can control.

The surname search, plus some extra elbow grease, was enough to find the match between us. AncestryDNA deserves credit for that--and the maps, surname lists, the search functionality, and all of the other tools they've come up with to analyze your cousin match. But the set of tools AncestryDNA provides is still incomplete. The single greatest thing they can do to improve the cousin matching experience would be to have a chromosome browser. I still believe it's unadulterated stubbornness that perpetuates their refusal to build one. A chromosome browser, together with the other tools they provide, would make their DNA test a tour de force of unstoppable discovery.

I understand that people take DNA tests for different reasons. Based on my experience with reaching out, I'd say that more than half of the people with any genetic connection to me have no interest in collaboration. That means that more than half of the messages I send will never amount to anything. This makes me think that some people come into this relationship already knowing they don't wish to contribute. But rather than wasting my time lamenting about it, I'd rather we simply created a way to be upfront with each other.

What reaching out to DNA cousin matches feels like
In my mind, this situation could be handled with a single check box--either as part of the registration process, or a prompt to every person who is part of the AncestryDNA system. “I am currently interested in collaborating with other researchers for the purpose of finding our common ancestors.” Check yes or no. I envision this as a status update type of feature, where we all can act like grown ups and communicate our intentions from the outset. I'm even envisioning that after a person hasn't been active on AncestryDNA for more than 3 months, that status is automatically changed to "No."

Imagine being able to filter your cousin matches by the people who are actively using their DNA tests. No more wasting time sending messages to people who never had any intentions of responding to them. If we can't change other people's behavior, we can at least communicate the behavior we all intend to exhibit.

DNA Circles: C

This was the first of the two newest features to the AncestryDNA test since my last review. A DNA Circle is where AncestryDNA points out the people who share DNA with you, as well as a common person in your trees.





I questioned this feature when it first launched, because all it takes to throw it off is for several cousins to have the same wrong information in their trees. While the DNA Circle links people together with shared DNA, the DNA Circle does no good if the ancestor it claims to represent is wrong.

However, this is not entirely AncestryDNA’s fault. Relying on member trees as part of this process is necessary. Research will always be a part of genealogy, including genetic genealogy. It’s on us to do a better job with our research, so the matching algorithms can do a better job of connecting us together. Being more exact is a necessary part of that process.

Moving forward after my DNA test, I made a lot of changes to the way I used my Ancestry member tree. I created a second tree in which I placed biological relationships only. I removed all extraneous information, including photos, to streamline my work with this DNAonly tree. I expanded the scope of my research for this tree to include all descendants, all siblings and half siblings, second marriages--anyone with a biological link to my direct line ancestors. At the same time, I cleaned up the dates and places in the Facts section, since these drastically improve the Map tool for the cousin match tree comparison. If we want better quality DNA Circles, we each need to participate in some aggressive housecleaning.

What I dislike is how the DNA Circles come with a page for the common ancestor, and that page is a random assortment of stuff from the trees of everyone in the Circle. Photos, Stories, Facts, dates, and names become an unattractive, oftentimes inaccurate jumble of ugliness.




There are no source citations, no criteria for anything that is placed automatically on that page. Being able to clean up and correct these DNA Circle pages is a much needed feature. Unless we're trying to create the world's largest (and worst) Ancestry member tree.

Rather than seeing an assemblage of what everyone has collected on the DNA Circle, I’d rather start with a blank slate, to which my cousins and I may add information. Provide us with the ability to collaborate, allowing us to choose what to add to this ancestor's page. Make valid source citations a requirement for submitting anything to an Ancestor's DNA Circle page. Otherwise, it becomes a compounded source of ignorance instead of providing genuine insight.

In fact, increasing the quality of the DNA Circle ancestor pages and Ancestry member trees could go hand-in-hand. Ancestry.com currently provides shaky leaf hints to member trees, which have a certain reputation for being garbage. These hints and copying data from other member trees is how errors spread and become entrenched in the family consciousness. Instead, why not hint everyone to the DNA Circle page? Let it become the single, authoritative source for researchers as they assemble their trees together--whether they've taken a DNA test or not. I'd much rather be introduced to cousins who haven't tested yet this way. If/when they do take an AncestryDNA test, I'll already know who they are!

I'd also like to see some better communication tools for the purposes of DNA collaboration. With each DNA Circle page, I envision a Google Hangouts-style interface which would foster online meet-ups/family reunions, group research discussions, and individual conversations between descendants. These meetings could be private, or publicly stored as part of the DNA Circle page.

A DNA Circle as it stands now seeks to reconstruct the identity of the dead. In order to do the greatest good, it should foster communication and a sense of kinship among the living.

Ancestor Discoveries: C+


Of all the new features on AncestryDNA, this one has me the most excited. This feature has done great things for me already, despite the accuracy shortcomings of the DNA Circles. Over time, I imagine this being one of AncestryDNA’s biggest assets—the thing that sets them apart from other testing services and websites.





So imagine a DNA Circle has been formed for an ancestor. It’s well established, and there are plenty of cousins all matched together. The only thing missing is you, because you share the same DNA as everyone else in the Circle. But the matching algorithm hasn’t matched you to the Circle, because you don’t have the ancestor in your tree yet.

Bummer, right?




Not anymore!

Ancestor Discoveries is intended to do exactly that. It has already done this for me. My Greene family is a hot mess. That’s what happens when the courthouse that services your ancestors burns down… Twice. I was stuck on Henry Greene for ages, until the Ancestor Discovery for his grandparents came along. I did the research to back up the information, because I know better than to believe people on the Internet. I had to go into some unusual places to find the evidence I needed, but finding it was a direct consequence of my Ancestor Discoveries. In terms of results, it really has delivered.

Part of why I like the direction AncestryDNA is going with Ancestor Discoveries is because the lovely so-and-so's with private trees are included. If they fit into a DNA Circle, they become a part of my potential Ancestor Discoveries. Everyone else with a private tree that isn't connected to a DNA Circle can be triangulated via the Shared Matches tab on their cousin match page. I now expend less effort on figuring out where these people fit into the puzzle, and move on to other research problems. AncestryDNA is figuring out ways to avoid giving me an inferior product because of someone else's privacy settings. As one of my chief complaints from my first review, the privacy settings of other users is one of AncestryDNA's areas of greatest improvement.

My only complaint regarding the Ancestor Discoveries is one specific place I've seen it fall apart. To put it delicately, I come from Southern communities in which endogamy was a common practice. I'm one of the lucky ones whose ancestors moved away before the family tree got too tangled, and our current generation is far removed from it. But some of my cousins who are still living in these communities haven't been so fortunate. I connect to them in a multitude of places. We have multiple sets of common ancestors. How well do the Ancestor Discoveries reflect situations like these? Because I know just enough about the science of how the relationship estimates are calculated to know this effectively hoses the entire thing. And some of the Ancestor Discoveries I'm getting suggest the matching algorithms are struggling.




And don't you all go making fun my endogamy. There are two types of people in this world: people who are inbred, and people who don't know it yet.

In situations like these, having segment data matters. I need to see the exact length of the DNA segment. Comparing it to standard genetic inheritance estimates is crucial to properly calculating my relationships to my cousins. I can't judge how skewed my inheritance is without the numbers--data that AncestryDNA does not display as part of its test. While I'm able to use GEDmatch.com to get this information, I would love so much more to have it as part of my Ancestor Discoveries. Localizing these connections, as well as analyzing them for accuracy, would be so much simpler with the segment data than it is without it.

Final Grade: C

AncestryDNA has made promising progress. I no longer consider it the worst $99 I ever spent. I still encourage anyone who is planning to take a DNA test to consider all of their options before purchasing one from AncestryDNA. Understand that you are making sacrifices of functionality no matter which testing company you choose, so be sure you choose the one that aligns with your reasons for testing.

Regardless of which testing service you use, your plans should also include uploading your results to GEDmatch.com. As a more open source option, it provides many of the analysis tools and data AncestryDNA is currently lacking. While there's a bit of a learning curve to using GEDmatch, it's time and effort well spent. If you need a beginner's guide, be sure to also check out our Genetic Genealogy for Beginners video series.

Good luck, and happy testing!

Tuesday, September 1, 2015

The Historical and Genealogical Society of Tomorrow

As a child, I grew up watching old school cartoons--especially those by Tex Avery. I remember sitting on the floor in my grandparent's second story apartment in rural Maryland, eating carrot sticks and watching the bizarre antics of politically incorrect animals. Among my favorites was the World of Tomorrow, the satirical look into the new century through the lens of the 1950s.

It's in that same tongue-in-cheek, yet curious spirit that I find myself asking what the historical and genealogical societies of tomorrow will look like. This question is largely inspired by my interactions with many different genealogical and historical societies over the past few months. I've had experiences both good and bad--both of which indicate where these societies will strive and struggle to find their place in the future.


With that, I present to you... The Historical and Genealogical Society of Tomorrow!

Updates

If a genealogical society is still spending money on sending paper newsletters through the mail, their organization is trapped in 1998. And if their website hasn't had any sort of major overhaul since then, I rest my case.

Social media, blogging, and email will take the place of paper newsletters in the genealogical society of the future. There are too many other important, meaningful ways their financial resources could be used than by sending out paper. Because paper newsletters are usually disseminated monthly or quarterly, to be heard from so infrequently is a losing battle for relevance. And as conserving natural resources grows in importance, unnecessary uses for paper will become increasingly unconscionable.

Throughout the years, many societies have tried to cut costs with low budget websites, and have avoided making real investments in their web presence. But it isn't enough to stick a Facebook badge on the old website and to call this the future. The HTML relics of yesteryear, complete with technicolor Comic Sans font and Clip Art bouquets, need to be given a proper burial. Today and tomorrow these websites need to be replaced by smarter solutions, especially for storage and security.

Because genealogical and historical societies of the future will take their place on the front lines of digitization, their websites need to become robust repositories of information. Becoming an online community trust means providing original records, transcribed indexes, photos, maps, better catalogs and directories for newspapers, books, periodicals, and vast collections of other records. Becoming the first providers for all legally available records is a market just waiting to be created.

If historical and genealogical societies want to participate in that market, they need to prepare themselves by stepping firmly into the future with their technology.

Collaboration

Preserving local history is a community affair. It requires interaction between organizations of all kinds, at every level. The historical and genealogical society of the future knows how to be the bridge between these organizations. Schools, colleges and universities, libraries and archives, courthouses and public offices, civic organizations, and businesses, and government offices of every kind, each play a role in this mission. Finding, protecting, digitizing, and sharing a community's history is a shared responsibility. Anyone can play a part, and successful societies recognize they can reach out to anyone.

Military participating in cemetery cleanup in Hawaii
Historical and genealogical societies of the future know how to create volunteer opportunities, both online and offline. They identify and exercise every resource at their disposal. If creating a new index means paying for scanning services, they're the ones to create and promote the GoFundMe campaign. Then they reach out online for volunteer indexers. When it finally comes time to build or expand the website for a new collection, they find the college students in web design who need an internship to graduate. These societies understand that when they unite diverse groups in a common love of family and history, they make their communities better places to live.

Collaboration in historical and genealogical societies of the future also means looking beyond immediate geography. Various historical records are no longer kept in the places that created them. Some of the most passionate historians do not live anywhere near the places they study. Societies will expand their reach to these places and people. Because these societies are looking to adapt, they will find ways to expand their membership offerings to those outside their communities, both online and offline.

Meetings are Old News

Gone will be days where the only way to attend meetings of these organizations is to actually live nearby. The genealogical societies of tomorrow will accept that the newest generation, in order to adapt to an ever-changing economy, has become one of the most transient in history. Their first cross country move is a rite of passage, their first experience living abroad a must-have. Especially for the minimalist urban living which defines the Millennial generation, the thought of a meeting that cannot be attended remotely is incomprehensible. Yes, including for genealogy, because hardly any of us live in the communities where our ancestors lived.




Webinars, Google Hangouts, and live YouTube events are the meetings of the future. It's what the new generation expects from any organization to which it gives its paying patronage. Attendance is not limited by geography, time zone, or day of the week. The most experienced researchers for a community may not actually live there, but they can be engaged and participating with the genealogical community who does. Because all that is required to create a YouTube channel is a computer, an internet connection, and a device that records video, anyone can do it. Google and YouTube have made all of the investment to make the software, the interface, and hosting the video available for free.

The only limitations for historical and genealogical society meetings of the future are a lack of imagination, and willingness to learn.

Generational Culture Clash

Historical and genealogical societies of the future understand that reaching my generation is crucial to their survival. Embracing new technology means bringing us into their organization by default. The environment the society creates by the activities they engage in will determine if we will choose to stay.

Reaching and retaining our generation is summarized in one word--inclusion. We want to feel included in every part of the society--decision making, leading projects, organizing events, spending funds, all of it. Our voices need to be heard, and have an impact. At the same time, we need to feel everyone else is included, too.

The most compelling way to attract our crowd in the future will be by preserving a more inclusive history. As the genealogical and historical societies of the future become the force behind creating new record collections, they need to include all types of people in these collections. Millennials are interested in minorities, the underdogs, the "forgotten" history not included in the history books. In many communities, the history of African Americans, Latinos, the LGBT community, and even women have received almost no attention by their local historical and genealogical societies. By collecting and preserving the records from these populations of their community, these societies choose to be inclusive. They become inviting places for my generation and our values.

Paywalls

The place where inclusiveness will fall apart most often for historical and genealogical societies of the future is the Paywall. Paywalls have made their way into the genealogical community, and their place has been unquestioningly embraced by many historical and genealogical societies already. 

But my generation hates paywalls. We hate them because they are not inclusive--they exclude someone from information, services, and a community based on their ability to pay. Because Millennials are the greatest consumers of digital media, we're the ones most affected by Paywalls. In staggering economies where we're also the ones most affected, we're the ones with the least disposable income. We resent paywalls both on principle, and out of self-preservation. 

But that doesn't mean our generation isn't willing to part with money. We prefer to donate and give based on the value of what we feel we have received. We embrace payment options that allow us to give according to what we have. Where we can't give money, we're often willing and able to work, trade, or barter. 

More than anything else, we delight in proving that you can accomplish more by being less concerned with money. In order to appeal to the Millennial generation, embracing this philosophy will be a necessary part of organizational growth and transition.





As a matter of demographic disclosure, I am 25 years old. I have been actively researching my genealogy for ten years. I consider myself an advanced non-professional. I am a paperless genealogist, and I do the vast majority of my research online. As part of the first generation to grow up in the Digital Revolution, there was never a time where I had to do genealogy without the Internet. To put it bluntly, I am incurably hard wired to share because to me, that is what genealogy has always been.

I have also never joined a historical or genealogical society. I have nothing against them. But I have also never come across one that was interested in the communities I research, who also has much to offer as I have to give.

My most recent experience with a genealogical society demonstrates how much adapting there is to do--both for these groups, and for me and the denizens of tomorrow. I contacted a genealogical society, in search of plot information for a cemetery which has not been well digitized. It will take years to identify all of the people, especially those of African descent, who are buried there without headstones. This society's is the most comprehensive database that exists online for that cemetery. However, it is also behind a paywall. 

I attempted to negotiate, offering to trade information with them. If they had no information about my family's exact location in this cemetery, that confirmation alone would be helpful. At which point, I would gladly give their names, death and burial dates, and my original sources--to add to the database. My instinct is to share.

The person I spoke to insisted at first that I buy a membership in order to access the cemetery collection on my own. The society only offers an annual membership, priced at $30. Their website has no other collections pertinent to my research. I live hundreds of miles away, and cannot attend any of their meetings. The bulk of that expense is to create and send a paper newsletter I don't want, and is not relevant to my research. But this is the way things have always been done. 

We spend all of this time trying to figure out how to tear down our brick walls, and now we're finding better ways to build them between each other. 

And maybe it was foolishness, maybe it was desperation, but I asked the person on the other side of the wall if perhaps there wasn't a better way.

I didn't get an answer right away. I didn't expect to get one at all. But the person--a woman, come to find out--took a brick out of the paywall, and passed me a name for a missing daughter I had never seen before. She even threw in some contact information for the caretakers of the cemetery and its records--a contact I never would have found on my own. And true to my word, I sent the names, dates, and sources for the rest of my family members buried in that cemetery.

I tried to be an example of the change and collaboration--the future--I believe in. Part of envisioning the future in genealogy is being part of the changes you hope to see. And my greatest hope is that this type of common sense cooperation becomes the rule of the future, not the exception.