> Genealogy Methodology
> Goal Oriented Research
Goal Oriented Research Process (a term I made up)
It's not my intent for the comments on this page to distract from the important work Geir, Adrian and Testuser are doing to document E&C.
Update (May 7 2011)
I wanted this page to be on topic/relevant for those who take the more more scientific approach/E&C. I chose to use Tom's original presentation as a backdrop for my comments. A current E&C overview might have led to more relevant content.
I want to thank those who took the time to read and exchange notes or thoughts. Special thanks to Adrian who took the time to ask, "Are you sure you don't push paper?" Some of the banter, I hope, led to a better understanding of the vital record, baptismal
and census record circumstance here in the states. I created the page Ancestry for Sale ... Vital Records Not Included
to show how my ancestors migrated westward as states and counties developed--always landing at a place about 50 years in advance of the time when births and deaths would be recorded.
I do not have a working description of the research process current E&C presumed as good practice, however, Tom's "Automatic Combining and Linking of Genealogical Records
," indicates "available evidence that may apply to the individuals is gathered, and then the evidence is used infer them. A researcher makes conclusions by laying out person records extracted from the evidence, and then grouping the records into sets he believes represent individuals from the past. The process is based on good practices and experience."
That does not describe my process, so I created this page to try to describe what I do.
Just as Adrian did, I want to understand why E&C seems to compare so unfavorably to my process. I KNOW there are important differences between E&C and how I work. As I try to explain the differences, I might not use the terms you use or identify the steps as you see it in E&C. Please appreciate, I seem on one end or the other of an information vacuum.
I WANT the benefits of technology to move genealogy forward. I just don't see why we have to give up the genealogy or a real evidence record to get there.
Initial Summary: (WIP)
I don't really think E&C was developed to support a process, but the description given in Automatic Combining, as above, I would label a "document collector" process. I don't want to be a document collector. I set out to find a specific document that should contain specific information about a specific individual or family (=information that should be evidence for me). The difference is not semantics. As I hope to describe, the elaborate system you set out to build in support of your process (or for your purpose) renders tools I need unusable.
I know you are brilliant data scientists, and want things from me to make the 'puter smart so you can realize on possibilities you see. BUT, BUT--the things you want to take are the very things that enable me to be a genealogist.
I want to write biographies or genealogies in actively managed narrative forms (my goal). I could say it more precisely, but to that end, I research to fully and correctly identify each family and each individual in the family, and to place all in historical context. I want each relationship correctly identified. The various "data requirements" are pretty well spelled out by the narrative form, and yes, the record of my evidence prints as reference notes.
To reach my goal, I research individuals at the family group level, in the body of evidence. I want to conduct an exhaustive search, working from the known to the unknown.
As a genealogist, I know at the start what my data requirements will be when I finish. The journey is how I get there. Because I'm not an expert on every town, culture, era, etc., I study to determine where information I'm seeking should be found. I'm not a document collector--I make broad searches in targeted record groups to find specific evidence rather than random searches in databases online (not that I ignore them).
Simply the Best (err... GeneJ wants to debunk the Conclusion-Only Myth)
Just as I hold my nose when we refer to facts, I hold my nose when we talk about conclusion-only. I make findings of "best evidence." It's not a best record or document or "preferred"/"primary" evidence-it's the best evidence. As the term implies, some evidence is better than other evidence, and I want to be able to look at (and report about) all the evidence in making my finding.
As I find more evidence, I will again and again review the evidence. Likewise, when I send off a biography or genealogy, I want to report about all the evidence. If it's not obvious, I will explain the logic and reasoning behind a finding of the "best evidence." See - Examples
from my working file.
I've asked, but am still not sure where all the evidence becomes stored in E&C. I know where to find the record of evidence in my working file--reference notes. My software allows reference notes to be moved up or down, so I can arrange to have the "best" evidence listed first, but still report about all the evidence. I easily *add* comments about conflicts and negative evidence.
My reference notes are the "evidence architecture" in the software I use today. Could we build a more flexible system--yes, but dismantling it hardly makes it more flexible.
Genealogy is identity and identify is evidence: Researching at the family group level
I research at the family group level, in the body of evidence
, working from the known to the unknown.
The "family group" is well defined by the narrative stye; it's not too different from the individuals reported on a family group sheet. The key data requirements by which individuals become identified is defined by the style. For the purpose of my working file, the family group details represent my best evidence
--some individuals will probably have been well identified (=enough evidence
to identify them well) while others might not be well identified (=not enough evidence
to identify them well).
The "body of evidence" is all the evidence from all the sources relevant to the family group. By the design of my genealogical software, this body of evidence is identified, summarized and evaluated in the collective group of reference notes
associated with the individuals in the family group.
I work from the known to the unknown. I identify research objectives from the evidence (=from my reference notes) about those I can identify, and set out to learn more about them, expecting that research to lead me to other family members not yet well identified. I research my own family; when I research to "link from," I'll learn more about family members and expect to learn about those I haven't yet identified well. Too often when I research to "link to," I learn a lot about another person's family.
I believe those who research at the family group level come to rely on the additional checks, balances and additional logic it provides for evidence located about individuals. This is a process--the more I learn about the family and each family member, the more I can learn. I presume later research will shed better light on earlier research.
Knowing you may believe E&C will ultimately research my family for me, better than I can, perhaps you can also see how dismantling my evidence record (=reference notes=evidence architecture), creating personas and entering E&C logic and reasoning ("I think this is the same person because ...") is counter productive to my effort. Not being able to quickly find my evidence (=my reference notes) makes me unable to effectively conduct routine searches, interview town clerks, reference librarians, archival specialists and descendants, etc. Basically, my reference notes are an integral part of my research. It probably goes without saying, but for the same reason, my research process does not benefit from having a family member split into two (or 152) personas any more than it does having two conflated into one.
The evidence record I create just seems easier and better than the evidence record in E&C. I think arriving at my "logic and reasoning" is also easier and better, because I can more quickly consider all the evidence.
Genealogical reference notes are narrated for a reason--too much about evidence doesn't readily conform to today's machine readable form. The evidence we want is often torn, smudged, contains strikeouts
or information that is overwritten. Most genealogical software allows the user to free-form reference note entries or use some stylized approach.
Conversely, it seemed "Automatic Combining and Linking ..." (p. 2, "Preparing Person Records") intended to work with "indexed databases." In that context, what E&C sets out to do makes more sense, but then that is only a fraction of the evidence market and problem only a fraction of the user market.
It's when E&C ventures beyond "indexed entries" to other data that I feel you are actually undermining the use of the word "evidence." (See handling of evidence.)
I don't have a full appreciation for how E&C stores logic and reasoning, but I know how I correlate evidence on my system and comment in my reference notes. Based on my experience, the necessary logic process usually runs like this. Initially I just don't know enough to be able to clearly identify someone. After I've researched a few more "knowns" I am able to identify evidence until I reach a critical mass, I'll have conflicts and some negative evidence; I likely need to consult additional authorities to understand the material. Between critical mass (my model term) and reasonably exhaustive, the evidence starts to fall in place. It explains earlier questions and evidence. Conflicts are resolved, etc.
I know that throughout the research process, in findings of "best evidence," my logic and reasoning covers all the evidence. I'm not sure it does for E&C. In E&C, you seem to be joining two personas. Lets say you have four birth "records" entered in the main persona, and you are joining another. When you make that join for the fifth record, do we assume the person is correlating all five records? I assume the fourth is represented on the face of the persona, but where are the other three records?
(In truth, I'm not sure logic and reason is even important in the scheme of "Automatic Combining ..." but a genealogist needs ready access to coherent, fully correlated summaries through out the research process and will eventually need present same in a biography or family group sheet.)
In software's existing evidence architecture, I'm able to record a summary of all my evidence, including all conflicts and all negative evidence. Throughout the research process, I'm able to see all the evidence and reasoning about all the evidence as necessary. I'm not a fan of requiring family historians to enter reference notes, but if BetterGEDCOM intends to establish an evidence based standard, I think it is a mistake to breech software's existing source systems to do so.
Add - Evidence Dimension [wip] ....
 Note: I'm going to assume we all agree that "evidence" from original documents often doesn't (thus just doesn't) conform to requirements of a database. Please let me know if that isn't a given or if we need some examples.
Handling of Evidence
I handle evidence differently than that way I see E&C proposes it be handled. I wouldn't consider taking a "snippet" out of context during the research phase, much less quoted material or abstracts from authorities. I expressly report about my translations--so that when I send a census reference note to Geir, he can say something like, "you didn't quite catch the essence on this one..." I include comments about indirect evidence. I actively manage my reference notes, adding comments if I later discover evidence or negative evidence.
(a) Aside from my problem with the process and not wanting evidence to be so darn hard to find, I have a problem when E&C "evidence" is in reality something inferred from the evidence. That is different than a user pulling direct evidence from an indexed entry (see About Evidence)--which is how I think Tom originally described it. The user pulls the indexed birth data and cites the birth index (not the original record)--the user didn't have to translate anything or interpret anything. We both see that evidence as evidence. I have a problem when BetterGEDCOM suggests E&C entries should apply to beyond the indexed entry to the original documents. 
(b) Somewhat akin to (a) much of what E&C wants to call "evidence" is actually material someone wrote in a letter, authored in a book or even spoke into a tape recorder. Am I the only one who sees a problem with a standard separating quotes from attribution? Ditto, entering "snippets" in a location separate from references the original author may have given (=source of the source)? 
Those who practice computer science/data science are professionals who apply skills and techniques; they look for opportunities and want to see their discipline grow. They provide an important service, and see genealogy as a data rich field.
Those who practice genealogy are professionals who apply skills and techniques--they too want to see their discipline grow and believe they do important work. Genealogists eagerly greet new technologies.
We all stand at the same place in history, looking to the future, but see different possibilities.
I'm not trying to address the commercial viability of E&C, nor am I questioning a possible relationship between that viability and BetterGEDCOM. I DO want us to look at E&C from the standpoint of trends in practices and methodologies and record status. I also want to develop a crisp understanding of the conflicts between this possible standard and my process and research requirements.
From what I can tell, there are several "process" issues.
(1) Conflicts abound. It's hard for me to imagine anything about "evidence" being destructive, but E&C is seems to do anything but support my research process needs and goals. I'm hoping for those working on E&C to provide a working description and examples of how reasonably complex evidence is actually entered and processed through to a final conclusion of "best evidence." Ditto, explain where all the evidence (including conflicting and negative evidence) is actually stored and how it is accessed. Finally, is E&C able to produce biographies, genealogies and family group sheets that report evidence fully and faithfully. The conflicts aren't minor. I'm not nit picking and it's not that I don't "get it."
(2) Under one master, BetterGEDCOM wants to support research activities by accessing the indexes of large record providers--not unlike FamilyTree Maker works with Ancestry.com. I enter birth data for someone, and Ancestry scours it's mega indexes and returns a little green leaf if it finds something "interesting." I can click on a link and go see that information. Am I not correct that BetterGEDCOM could support such activity with "best evidence" findings and reference notes intact? (On a little test of 20 records, Ancestry's little green leaf actually returns a correct entry for me just under 20% of the time. Most of the time, it didn't return an entry.)
(3) The second master seems that of "Automatic Combining ..." Looking at those features from the standpoint of a genealogist, I think it's a mistake to try to render such a feature/E&C for other than third party indexed records  --but how large a base is that? And what lies ahead for those users who catch the bug and want to play with the real historical documents.
E&C is complex and invasive. In order for BetterGEDCOM to have the automatic combination of genealogical records, it seems to want to write off my genealogy. I have 6000 sources and 40000 citations. It is the most valuable part of my database; I will never use a software program that doesn't recognize and support my process.
Thank you for your time. --GJ
Perhaps a bit more tomorrow (saving so I don't loose this)
thanks for your thoughts. I scanned them quickly a few days ago, and now again. It's a lot to parse :)
I think we should go through your Research Process and through your concerns about a E&C Model and I'm optimistic that we'll see that your Process is not being excluded by anything that has been brought up.
Without going into details yet, here are some thoughts:
I feel you're a bit afraid of the E&C model. But you're actually doing good work with current software. E&C is not going to force future software to do less than what current software does.
BG is going to add the things that GEDCOM misses now. It's also going to add the possibility of using a E&C model in the software. It's not going to force a software to change to that model.
But - even if your software were to change its model to E&C: You probably wouldn't notice! There might be a few more options and possible input screens. But it could also look exactly the same.
So you could keep working in exactly the same way.
A multi-level data model can store anything a single-level model (like old GEDCOM) can, and then some.
We've been focussing on the "Person Records in a tree" part, because it's new and powerful, and we want to see how to do it right.
We've not yet focussed much on Research Notes / Reference Notes and the whole "administrative" area. This we will need to do.
But these parts are really independent of how the other part works.
It will be a good challenge to demonstrate and explain the details!
To really understand your process, it might be helpful to see exactly what you're doing, when, and how:
I too, would like to see a GEDCOM file from your software.
Have been busy off wiki.
Quick post here, better after I have a chance to review the comments more.
Golly though, I've worked pretty hard to share real world evidence and research on the _Build a BetterGEDCOM_ blog and on my personal blog, _They Came Before_.
You can see selected bits from what was my 2007 (work from c1995-2007) file on WorldConnect. http://worldconnect.rootsweb.ancestry.com/cgi-bin/igm.cgi?op=GET&db=genejunky&id=I360
Thanks for asking, but I'm not sure why you think I should share my current working file with BetterGEDCOM. I really do want to publish particular biographies or genealogies. Actually, I wanted to share one biography, but with all that's happened, I couldn't get it to an editor and get their okay on how to share/what to share.
How can you say it's not going to change how current software operates?
Does your current software ask you to answer, "I think this is the same person because...?"
I'm not sure why feel I'm afraid of it?
You wrote, "What software .... " I assume you mean as far as genealogy software is concerned. I use TMG and have used GenBox.
You wrote, "Are there things that happen on paper only (no software)?" No.
You wrote, "How do you get to the final essays and reports?"
Maybe you more specifics? Relative to the record based work you are doing--I enter a lot of evidence, but don't enter as many tags as I used to.
Along the lines of the comment on the page ... We stand at the same place in history and see a different future. You write, "you could keep working the same way ..." Hope you can appreciate how that comes across to the genealogist in me. Sort of like, "Hey, we're laying the groundwork for the next generation of software, and the good news is, we hope you won't lose anything."
It can be stored in exactly the same place as where you store your evidence now. It may offer you the chance to break down your evidence further, but you don't have to take that opportunity. Not sure what else I can say given that none of us write the software.
Automatic Data Collection might very well put evidence elsewhere - because it's automatic. But ADC is not E&CM.
E&CM wants to divide stuff into more detail - but not to break apart stuff so that connections can no longer be made. It's a legitimate question whether the advances are quite as smart as we would hope for (e.g. the question of how much interpretation goes into a persona) - but none of the current practices are removed.
It's also quite legitimate to wonder if the response time is slower if you take "advantage" of the new connections.
"is E&C able to produce biographies, genealogies and family group sheets that report evidence fully and faithfully?"
E&CM is not a methodology, it is not an application. It does not even dictate a methodology or application. Therefore it cannot produce biographies, genealogies and family group sheets. >>>>> But GEDCOM cannot do these things either. <<<<<<<<<
"Under one master, BetterGEDCOM wants to support research activities by accessing the indexes of large record providers"
What does this mean? What 'master'? We both of us agree that sucking in indices without assessing their quality is a road to ruin. I see no reason why E&CM does not support Genealogical Proof Standard.
"Am I not correct that BetterGEDCOM could support such activity [souring such indices] with "best evidence" findings and reference notes intact?" Of course you are correct. The point is that no-one has suggested dismantling "best evidence" findings and reference notes.
"The second master seems that of 'Automatic Combining' ..." As we have said, ADC is not E&CM.
"In order for BetterGEDCOM to have the automatic combination of genealogical records, it seems to want to write off my genealogy" Given that neither BG nor E&CM want to mandate auto combination, your concern is misplaced.
Thanks for your additional comments.
I had asked for a more current overview of E&C before posting the page. I was directed back to the wiki to read all of Tom's posts. I worked with Tom's original overview and my understanding of Tom's most recent thinking.
--Didn't want to disrupt process of getting current logic/E&C documented.
I appreciate the logic in Tom's posting E&C/ACD (Automatic Combining ...). The "E" in "E&C" is prepackaged machine readable index data (created by third parties). The 'puter is going to work to infer relationships from the four corners of that "evidence" on a grand scale ... etc., etc. I still would have captured the full source, including extract/abstract in the reference note, but (to me) that seems less about E&C and more about "efficient" data. As I wrote on the page, there are side effects from trying to commingle Tom's original logic with user-driven genealogical processes and real-world historical documents/evidence.
While this part of the topic is out of date now, in the earlier banter with Tom, I don't think I had misunderstood his view of limiting the reference note to a source locator.
A little off topic ... I know some users now don't abstract or extract bits into their working file reference notes. When they find another item of evidence, they either enter another source to the same tag (again without the abstract/extract) or as a source to a new tag. Over time, because they haven't kept that "snippet" in the reference note, it's pretty easy to lose track of the different information from the different sources.
Gene - exactly. Been there, done that, got the T-shirt of confusion.
And that is precisely one of the issues that the E&C Model attempts to deal with.
It actually doesn't quite mention that aspect but I realise now that is a side effect of the requirement. Or is it a side-effect of the solution?
Either way, it would be possible for the user to drill down from the list of sources cited against a PFACT or Event. Because the record of previous values is permanent, at each level down you see the previous values and the previous cited sources. At some point in your descent you see the bits change and it should show therefore which cited source triggered the change.
It's probably not simple to design a decent GUI for, but the E&C data model enables it.
I should probably update the proposed requirement to add this as a possible reason for doing it, so thanks for that contribution to the model.
"All the evidence and the best evidence" doesn’t seem the same as drilling down to the previous values.
I think it's more about how one arrives as the "primary"/"preferred" value (and wanting that value to report all the evidence and the best evidence).
Well - possibly, but I was answering "it's pretty easy to lose track of the different information from the different sources" and drilling down to previous values WITH their citations seems to give one the ability to track WHERE a bit of information comes from.
So, when one gets lost because something's not been written down, E&C Model can help with WHERE something comes from, but it can never help with WHY it's the preferred value. Seems to me that something is better than nothing.