> Genealogy Methodology
> Goal Oriented Research
Goal Oriented Research Process (a term I made up)
It's not my intent for the comments on this page to distract from the important work Geir, Adrian and Testuser are doing to document E&C.
Update (May 7 2011)
I wanted this page to be on topic/relevant for those who take the more more scientific approach/E&C. I chose to use Tom's original presentation as a backdrop for my comments. A current E&C overview might have led to more relevant content.
I want to thank those who took the time to read and exchange notes or thoughts. Special thanks to Adrian who took the time to ask, "Are you sure you don't push paper?" Some of the banter, I hope, led to a better understanding of the vital record, baptismal
and census record circumstance here in the states. I created the page Ancestry for Sale ... Vital Records Not Included
to show how my ancestors migrated westward as states and counties developed--always landing at a place about 50 years in advance of the time when births and deaths would be recorded.
I do not have a working description of the research process current E&C presumed as good practice, however, Tom's "Automatic Combining and Linking of Genealogical Records
," indicates "available evidence that may apply to the individuals is gathered, and then the evidence is used infer them. A researcher makes conclusions by laying out person records extracted from the evidence, and then grouping the records into sets he believes represent individuals from the past. The process is based on good practices and experience."
That does not describe my process, so I created this page to try to describe what I do.
Just as Adrian did, I want to understand why E&C seems to compare so unfavorably to my process. I KNOW there are important differences between E&C and how I work. As I try to explain the differences, I might not use the terms you use or identify the steps as you see it in E&C. Please appreciate, I seem on one end or the other of an information vacuum.
I WANT the benefits of technology to move genealogy forward. I just don't see why we have to give up the genealogy or a real evidence record to get there.
Initial Summary: (WIP)
I don't really think E&C was developed to support a process, but the description given in Automatic Combining, as above, I would label a "document collector" process. I don't want to be a document collector. I set out to find a specific document that should contain specific information about a specific individual or family (=information that should be evidence for me). The difference is not semantics. As I hope to describe, the elaborate system you set out to build in support of your process (or for your purpose) renders tools I need unusable.
I know you are brilliant data scientists, and want things from me to make the 'puter smart so you can realize on possibilities you see. BUT, BUT--the things you want to take are the very things that enable me to be a genealogist.
I want to write biographies or genealogies in actively managed narrative forms (my goal). I could say it more precisely, but to that end, I research to fully and correctly identify each family and each individual in the family, and to place all in historical context. I want each relationship correctly identified. The various "data requirements" are pretty well spelled out by the narrative form, and yes, the record of my evidence prints as reference notes.
To reach my goal, I research individuals at the family group level, in the body of evidence. I want to conduct an exhaustive search, working from the known to the unknown.
As a genealogist, I know at the start what my data requirements will be when I finish. The journey is how I get there. Because I'm not an expert on every town, culture, era, etc., I study to determine where information I'm seeking should be found. I'm not a document collector--I make broad searches in targeted record groups to find specific evidence rather than random searches in databases online (not that I ignore them).
Simply the Best (err... GeneJ wants to debunk the Conclusion-Only Myth)
Just as I hold my nose when we refer to facts, I hold my nose when we talk about conclusion-only. I make findings of "best evidence." It's not a best record or document or "preferred"/"primary" evidence-it's the best evidence. As the term implies, some evidence is better than other evidence, and I want to be able to look at (and report about) all the evidence in making my finding.
As I find more evidence, I will again and again review the evidence. Likewise, when I send off a biography or genealogy, I want to report about all the evidence. If it's not obvious, I will explain the logic and reasoning behind a finding of the "best evidence." See - Examples
from my working file.
I've asked, but am still not sure where all the evidence becomes stored in E&C. I know where to find the record of evidence in my working file--reference notes. My software allows reference notes to be moved up or down, so I can arrange to have the "best" evidence listed first, but still report about all the evidence. I easily *add* comments about conflicts and negative evidence.
My reference notes are the "evidence architecture" in the software I use today. Could we build a more flexible system--yes, but dismantling it hardly makes it more flexible.
Genealogy is identity and identify is evidence: Researching at the family group level
I research at the family group level, in the body of evidence
, working from the known to the unknown.
The "family group" is well defined by the narrative stye; it's not too different from the individuals reported on a family group sheet. The key data requirements by which individuals become identified is defined by the style. For the purpose of my working file, the family group details represent my best evidence
--some individuals will probably have been well identified (=enough evidence
to identify them well) while others might not be well identified (=not enough evidence
to identify them well).
The "body of evidence" is all the evidence from all the sources relevant to the family group. By the design of my genealogical software, this body of evidence is identified, summarized and evaluated in the collective group of reference notes
associated with the individuals in the family group.
I work from the known to the unknown. I identify research objectives from the evidence (=from my reference notes) about those I can identify, and set out to learn more about them, expecting that research to lead me to other family members not yet well identified. I research my own family; when I research to "link from," I'll learn more about family members and expect to learn about those I haven't yet identified well. Too often when I research to "link to," I learn a lot about another person's family.
I believe those who research at the family group level come to rely on the additional checks, balances and additional logic it provides for evidence located about individuals. This is a process--the more I learn about the family and each family member, the more I can learn. I presume later research will shed better light on earlier research.
Knowing you may believe E&C will ultimately research my family for me, better than I can, perhaps you can also see how dismantling my evidence record (=reference notes=evidence architecture), creating personas and entering E&C logic and reasoning ("I think this is the same person because ...") is counter productive to my effort. Not being able to quickly find my evidence (=my reference notes) makes me unable to effectively conduct routine searches, interview town clerks, reference librarians, archival specialists and descendants, etc. Basically, my reference notes are an integral part of my research. It probably goes without saying, but for the same reason, my research process does not benefit from having a family member split into two (or 152) personas any more than it does having two conflated into one.
The evidence record I create just seems easier and better than the evidence record in E&C. I think arriving at my "logic and reasoning" is also easier and better, because I can more quickly consider all the evidence.
Genealogical reference notes are narrated for a reason--too much about evidence doesn't readily conform to today's machine readable form. The evidence we want is often torn, smudged, contains strikeouts
or information that is overwritten. Most genealogical software allows the user to free-form reference note entries or use some stylized approach.
Conversely, it seemed "Automatic Combining and Linking ..." (p. 2, "Preparing Person Records") intended to work with "indexed databases." In that context, what E&C sets out to do makes more sense, but then that is only a fraction of the evidence market and problem only a fraction of the user market.
It's when E&C ventures beyond "indexed entries" to other data that I feel you are actually undermining the use of the word "evidence." (See handling of evidence.)
I don't have a full appreciation for how E&C stores logic and reasoning, but I know how I correlate evidence on my system and comment in my reference notes. Based on my experience, the necessary logic process usually runs like this. Initially I just don't know enough to be able to clearly identify someone. After I've researched a few more "knowns" I am able to identify evidence until I reach a critical mass, I'll have conflicts and some negative evidence; I likely need to consult additional authorities to understand the material. Between critical mass (my model term) and reasonably exhaustive, the evidence starts to fall in place. It explains earlier questions and evidence. Conflicts are resolved, etc.
I know that throughout the research process, in findings of "best evidence," my logic and reasoning covers all the evidence. I'm not sure it does for E&C. In E&C, you seem to be joining two personas. Lets say you have four birth "records" entered in the main persona, and you are joining another. When you make that join for the fifth record, do we assume the person is correlating all five records? I assume the fourth is represented on the face of the persona, but where are the other three records?
(In truth, I'm not sure logic and reason is even important in the scheme of "Automatic Combining ..." but a genealogist needs ready access to coherent, fully correlated summaries through out the research process and will eventually need present same in a biography or family group sheet.)
In software's existing evidence architecture, I'm able to record a summary of all my evidence, including all conflicts and all negative evidence. Throughout the research process, I'm able to see all the evidence and reasoning about all the evidence as necessary. I'm not a fan of requiring family historians to enter reference notes, but if BetterGEDCOM intends to establish an evidence based standard, I think it is a mistake to breech software's existing source systems to do so.
Add - Evidence Dimension [wip] ....
 Note: I'm going to assume we all agree that "evidence" from original documents often doesn't (thus just doesn't) conform to requirements of a database. Please let me know if that isn't a given or if we need some examples.
Handling of Evidence
I handle evidence differently than that way I see E&C proposes it be handled. I wouldn't consider taking a "snippet" out of context during the research phase, much less quoted material or abstracts from authorities. I expressly report about my translations--so that when I send a census reference note to Geir, he can say something like, "you didn't quite catch the essence on this one..." I include comments about indirect evidence. I actively manage my reference notes, adding comments if I later discover evidence or negative evidence.
(a) Aside from my problem with the process and not wanting evidence to be so darn hard to find, I have a problem when E&C "evidence" is in reality something inferred from the evidence. That is different than a user pulling direct evidence from an indexed entry (see About Evidence)--which is how I think Tom originally described it. The user pulls the indexed birth data and cites the birth index (not the original record)--the user didn't have to translate anything or interpret anything. We both see that evidence as evidence. I have a problem when BetterGEDCOM suggests E&C entries should apply to beyond the indexed entry to the original documents. 
(b) Somewhat akin to (a) much of what E&C wants to call "evidence" is actually material someone wrote in a letter, authored in a book or even spoke into a tape recorder. Am I the only one who sees a problem with a standard separating quotes from attribution? Ditto, entering "snippets" in a location separate from references the original author may have given (=source of the source)? 
Those who practice computer science/data science are professionals who apply skills and techniques; they look for opportunities and want to see their discipline grow. They provide an important service, and see genealogy as a data rich field.
Those who practice genealogy are professionals who apply skills and techniques--they too want to see their discipline grow and believe they do important work. Genealogists eagerly greet new technologies.
We all stand at the same place in history, looking to the future, but see different possibilities.
I'm not trying to address the commercial viability of E&C, nor am I questioning a possible relationship between that viability and BetterGEDCOM. I DO want us to look at E&C from the standpoint of trends in practices and methodologies and record status. I also want to develop a crisp understanding of the conflicts between this possible standard and my process and research requirements.
From what I can tell, there are several "process" issues.
(1) Conflicts abound. It's hard for me to imagine anything about "evidence" being destructive, but E&C is seems to do anything but support my research process needs and goals. I'm hoping for those working on E&C to provide a working description and examples of how reasonably complex evidence is actually entered and processed through to a final conclusion of "best evidence." Ditto, explain where all the evidence (including conflicting and negative evidence) is actually stored and how it is accessed. Finally, is E&C able to produce biographies, genealogies and family group sheets that report evidence fully and faithfully. The conflicts aren't minor. I'm not nit picking and it's not that I don't "get it."
(2) Under one master, BetterGEDCOM wants to support research activities by accessing the indexes of large record providers--not unlike FamilyTree Maker works with Ancestry.com. I enter birth data for someone, and Ancestry scours it's mega indexes and returns a little green leaf if it finds something "interesting." I can click on a link and go see that information. Am I not correct that BetterGEDCOM could support such activity with "best evidence" findings and reference notes intact? (On a little test of 20 records, Ancestry's little green leaf actually returns a correct entry for me just under 20% of the time. Most of the time, it didn't return an entry.)
(3) The second master seems that of "Automatic Combining ..." Looking at those features from the standpoint of a genealogist, I think it's a mistake to try to render such a feature/E&C for other than third party indexed records  --but how large a base is that? And what lies ahead for those users who catch the bug and want to play with the real historical documents.
E&C is complex and invasive. In order for BetterGEDCOM to have the automatic combination of genealogical records, it seems to want to write off my genealogy. I have 6000 sources and 40000 citations. It is the most valuable part of my database; I will never use a software program that doesn't recognize and support my process.
Thank you for your time. --GJ
Perhaps a bit more tomorrow (saving so I don't loose this)