HOME > Genealogy Methodology > Goal Oriented Research

Goal Oriented Research Process (a term I made up)

It's not my intent for the comments on this page to distract from the important work Geir, Adrian and Testuser are doing to document E&C.

Update (May 7 2011)
I wanted this page to be on topic/relevant for those who take the more more scientific approach/E&C. I chose to use Tom's original presentation as a backdrop for my comments. A current E&C overview might have led to more relevant content.

I want to thank those who took the time to read and exchange notes or thoughts. Special thanks to Adrian who took the time to ask, "Are you sure you don't push paper?" Some of the banter, I hope, led to a better understanding of the vital record, baptismal, probate and census record circumstance here in the states. I created the page Ancestry for Sale ... Vital Records Not Included to show how my ancestors migrated westward as states and counties developed--always landing at a place about 50 years in advance of the time when births and deaths would be recorded.

Original comments:
I do not have a working description of the research process current E&C presumed as good practice, however, Tom's "Automatic Combining and Linking of Genealogical Records," indicates "available evidence that may apply to the individuals is gathered, and then the evidence is used infer them. A researcher makes conclusions by laying out person records extracted from the evidence, and then grouping the records into sets he believes represent individuals from the past. The process is based on good practices and experience."

That does not describe my process, so I created this page to try to describe what I do.

Just as Adrian did, I want to understand why E&C seems to compare so unfavorably to my process. I KNOW there are important differences between E&C and how I work. As I try to explain the differences, I might not use the terms you use or identify the steps as you see it in E&C. Please appreciate, I seem on one end or the other of an information vacuum.

I WANT the benefits of technology to move genealogy forward. I just don't see why we have to give up the genealogy or a real evidence record to get there.

Initial Summary: (WIP)

I don't really think E&C was developed to support a process, but the description given in Automatic Combining, as above, I would label a "document collector" process. I don't want to be a document collector. I set out to find a specific document that should contain specific information about a specific individual or family (=information that should be evidence for me). The difference is not semantics. As I hope to describe, the elaborate system you set out to build in support of your process (or for your purpose) renders tools I need unusable.

I know you are brilliant data scientists, and want things from me to make the 'puter smart so you can realize on possibilities you see. BUT, BUT--the things you want to take are the very things that enable me to be a genealogist.


I want to write biographies or genealogies in actively managed narrative forms (my goal). I could say it more precisely, but to that end, I research to fully and correctly identify each family and each individual in the family, and to place all in historical context. I want each relationship correctly identified. The various "data requirements" are pretty well spelled out by the narrative form, and yes, the record of my evidence prints as reference notes.

To reach my goal, I research individuals at the family group level, in the body of evidence. I want to conduct an exhaustive search, working from the known to the unknown.

As a genealogist, I know at the start what my data requirements will be when I finish. The journey is how I get there. Because I'm not an expert on every town, culture, era, etc., I study to determine where information I'm seeking should be found. I'm not a document collector--I make broad searches in targeted record groups to find specific evidence rather than random searches in databases online (not that I ignore them).

Simply the Best (err... GeneJ wants to debunk the Conclusion-Only Myth)

Just as I hold my nose when we refer to facts, I hold my nose when we talk about conclusion-only. I make findings of "best evidence." It's not a best record or document or "preferred"/"primary" evidence-it's the best evidence. As the term implies, some evidence is better than other evidence, and I want to be able to look at (and report about) all the evidence in making my finding.

As I find more evidence, I will again and again review the evidence. Likewise, when I send off a biography or genealogy, I want to report about all the evidence. If it's not obvious, I will explain the logic and reasoning behind a finding of the "best evidence." See - Examples from my working file.

I've asked, but am still not sure where all the evidence becomes stored in E&C. I know where to find the record of evidence in my working file--reference notes. My software allows reference notes to be moved up or down, so I can arrange to have the "best" evidence listed first, but still report about all the evidence. I easily *add* comments about conflicts and negative evidence.

My reference notes are the "evidence architecture" in the software I use today. Could we build a more flexible system--yes, but dismantling it hardly makes it more flexible.

Genealogy is identity and identify is evidence: Researching at the family group level

I research at the family group level, in the body of evidence, working from the known to the unknown.

The "family group" is well defined by the narrative stye; it's not too different from the individuals reported on a family group sheet. The key data requirements by which individuals become identified is defined by the style. For the purpose of my working file, the family group details represent my best evidence--some individuals will probably have been well identified (=enough evidence to identify them well) while others might not be well identified (=not enough evidence to identify them well).

The "body of evidence" is all the evidence from all the sources relevant to the family group. By the design of my genealogical software, this body of evidence is identified, summarized and evaluated in the collective group of reference notes associated with the individuals in the family group.

I work from the known to the unknown. I identify research objectives from the evidence (=from my reference notes) about those I can identify, and set out to learn more about them, expecting that research to lead me to other family members not yet well identified. I research my own family; when I research to "link from," I'll learn more about family members and expect to learn about those I haven't yet identified well. Too often when I research to "link to," I learn a lot about another person's family.

I believe those who research at the family group level come to rely on the additional checks, balances and additional logic it provides for evidence located about individuals. This is a process--the more I learn about the family and each family member, the more I can learn. I presume later research will shed better light on earlier research.

Knowing you may believe E&C will ultimately research my family for me, better than I can, perhaps you can also see how dismantling my evidence record (=reference notes=evidence architecture), creating personas and entering E&C logic and reasoning ("I think this is the same person because ...") is counter productive to my effort. Not being able to quickly find my evidence (=my reference notes) makes me unable to effectively conduct routine searches, interview town clerks, reference librarians, archival specialists and descendants, etc. Basically, my reference notes are an integral part of my research. It probably goes without saying, but for the same reason, my research process does not benefit from having a family member split into two (or 152) personas any more than it does having two conflated into one.

About Evidence

The evidence record I create just seems easier and better than the evidence record in E&C. I think arriving at my "logic and reasoning" is also easier and better, because I can more quickly consider all the evidence.

Genealogical reference notes are narrated for a reason--too much about evidence doesn't readily conform to today's machine readable form. The evidence we want is often torn, smudged, contains strikeouts or information that is overwritten.[1] Most genealogical software allows the user to free-form reference note entries or use some stylized approach.

Conversely, it seemed "Automatic Combining and Linking ..." (p. 2, "Preparing Person Records") intended to work with "indexed databases." In that context, what E&C sets out to do makes more sense, but then that is only a fraction of the evidence market and problem only a fraction of the user market.

It's when E&C ventures beyond "indexed entries" to other data that I feel you are actually undermining the use of the word "evidence." (See handling of evidence.)

I don't have a full appreciation for how E&C stores logic and reasoning, but I know how I correlate evidence on my system and comment in my reference notes. Based on my experience, the necessary logic process usually runs like this. Initially I just don't know enough to be able to clearly identify someone. After I've researched a few more "knowns" I am able to identify evidence until I reach a critical mass, I'll have conflicts and some negative evidence; I likely need to consult additional authorities to understand the material. Between critical mass (my model term) and reasonably exhaustive, the evidence starts to fall in place. It explains earlier questions and evidence. Conflicts are resolved, etc.

I know that throughout the research process, in findings of "best evidence," my logic and reasoning covers all the evidence. I'm not sure it does for E&C. In E&C, you seem to be joining two personas. Lets say you have four birth "records" entered in the main persona, and you are joining another. When you make that join for the fifth record, do we assume the person is correlating all five records? I assume the fourth is represented on the face of the persona, but where are the other three records?

(In truth, I'm not sure logic and reason is even important in the scheme of "Automatic Combining ..." but a genealogist needs ready access to coherent, fully correlated summaries through out the research process and will eventually need present same in a biography or family group sheet.)

In software's existing evidence architecture, I'm able to record a summary of all my evidence, including all conflicts and all negative evidence. Throughout the research process, I'm able to see all the evidence and reasoning about all the evidence as necessary. I'm not a fan of requiring family historians to enter reference notes, but if BetterGEDCOM intends to establish an evidence based standard, I think it is a mistake to breech software's existing source systems to do so.

Add - Evidence Dimension [wip] ....

[1] Note: I'm going to assume we all agree that "evidence" from original documents often doesn't (thus just doesn't) conform to requirements of a database. Please let me know if that isn't a given or if we need some examples.

Handling of Evidence

I handle evidence differently than that way I see E&C proposes it be handled. I wouldn't consider taking a "snippet" out of context during the research phase, much less quoted material or abstracts from authorities. I expressly report about my translations--so that when I send a census reference note to Geir, he can say something like, "you didn't quite catch the essence on this one..." I include comments about indirect evidence. I actively manage my reference notes, adding comments if I later discover evidence or negative evidence.

(a) Aside from my problem with the process and not wanting evidence to be so darn hard to find, I have a problem when E&C "evidence" is in reality something inferred from the evidence. That is different than a user pulling direct evidence from an indexed entry (see About Evidence)--which is how I think Tom originally described it. The user pulls the indexed birth data and cites the birth index (not the original record)--the user didn't have to translate anything or interpret anything. We both see that evidence as evidence. I have a problem when BetterGEDCOM suggests E&C entries should apply to beyond the indexed entry to the original documents. [1]

(b) Somewhat akin to (a) much of what E&C wants to call "evidence" is actually material someone wrote in a letter, authored in a book or even spoke into a tape recorder. Am I the only one who sees a problem with a standard separating quotes from attribution? Ditto, entering "snippets" in a location separate from references the original author may have given (=source of the source)? [1]

Final thoughts/comments

Those who practice computer science/data science are professionals who apply skills and techniques; they look for opportunities and want to see their discipline grow. They provide an important service, and see genealogy as a data rich field.

Those who practice genealogy are professionals who apply skills and techniques--they too want to see their discipline grow and believe they do important work. Genealogists eagerly greet new technologies.

We all stand at the same place in history, looking to the future, but see different possibilities.

I'm not trying to address the commercial viability of E&C, nor am I questioning a possible relationship between that viability and BetterGEDCOM. I DO want us to look at E&C from the standpoint of trends in practices and methodologies and record status. I also want to develop a crisp understanding of the conflicts between this possible standard and my process and research requirements.

From what I can tell, there are several "process" issues.
(1) Conflicts abound. It's hard for me to imagine anything about "evidence" being destructive, but E&C is seems to do anything but support my research process needs and goals. I'm hoping for those working on E&C to provide a working description and examples of how reasonably complex evidence is actually entered and processed through to a final conclusion of "best evidence." Ditto, explain where all the evidence (including conflicting and negative evidence) is actually stored and how it is accessed. Finally, is E&C able to produce biographies, genealogies and family group sheets that report evidence fully and faithfully. The conflicts aren't minor. I'm not nit picking and it's not that I don't "get it."

(2) Under one master, BetterGEDCOM wants to support research activities by accessing the indexes of large record providers--not unlike FamilyTree Maker works with Ancestry.com. I enter birth data for someone, and Ancestry scours it's mega indexes and returns a little green leaf if it finds something "interesting." I can click on a link and go see that information. Am I not correct that BetterGEDCOM could support such activity with "best evidence" findings and reference notes intact? (On a little test of 20 records, Ancestry's little green leaf actually returns a correct entry for me just under 20% of the time. Most of the time, it didn't return an entry.)

(3) The second master seems that of "Automatic Combining ..." Looking at those features from the standpoint of a genealogist, I think it's a mistake to try to render such a feature/E&C for other than third party indexed records [1] --but how large a base is that? And what lies ahead for those users who catch the bug and want to play with the real historical documents.

E&C is complex and invasive. In order for BetterGEDCOM to have the automatic combination of genealogical records, it seems to want to write off my genealogy. I have 6000 sources and 40000 citations. It is the most valuable part of my database; I will never use a software program that doesn't recognize and support my process.

Thank you for your time. --GJ

Perhaps a bit more tomorrow (saving so I don't loose this)


testuser42 2011-05-03T16:51:58-07:00
Some responses...
Hi Gene,

thanks for your thoughts. I scanned them quickly a few days ago, and now again. It's a lot to parse :)

I think we should go through your Research Process and through your concerns about a E&C Model and I'm optimistic that we'll see that your Process is not being excluded by anything that has been brought up.

Without going into details yet, here are some thoughts:

I feel you're a bit afraid of the E&C model. But you're actually doing good work with current software. E&C is not going to force future software to do less than what current software does.
BG is going to add the things that GEDCOM misses now. It's also going to add the possibility of using a E&C model in the software. It's not going to force a software to change to that model.
But - even if your software were to change its model to E&C: You probably wouldn't notice! There might be a few more options and possible input screens. But it could also look exactly the same.
So you could keep working in exactly the same way.

A multi-level data model can store anything a single-level model (like old GEDCOM) can, and then some.
We've been focussing on the "Person Records in a tree" part, because it's new and powerful, and we want to see how to do it right.
We've not yet focussed much on Research Notes / Reference Notes and the whole "administrative" area. This we will need to do.
But these parts are really independent of how the other part works.

It will be a good challenge to demonstrate and explain the details!

To really understand your process, it might be helpful to see exactly what you're doing, when, and how:
mmartineau 2011-05-03T17:02:08-07:00
I agree. E&C does not change how current software operates. It's just a better underlying model that will allow more than can be done with current models. It doesn't take anything away.

I too, would like to see a GEDCOM file from your software.
GeneJ 2011-05-04T12:39:52-07:00
Hi gang.

Have been busy off wiki.

Quick post here, better after I have a chance to review the comments more.

Golly though, I've worked pretty hard to share real world evidence and research on the _Build a BetterGEDCOM_ blog and on my personal blog, _They Came Before_.

You can see selected bits from what was my 2007 (work from c1995-2007) file on WorldConnect. http://worldconnect.rootsweb.ancestry.com/cgi-bin/igm.cgi?op=GET&db=genejunky&id=I360

Thanks for asking, but I'm not sure why you think I should share my current working file with BetterGEDCOM. I really do want to publish particular biographies or genealogies. Actually, I wanted to share one biography, but with all that's happened, I couldn't get it to an editor and get their okay on how to share/what to share.
GeneJ 2011-05-05T14:21:51-07:00
More questions to answer, gang, but I'm curious.

How can you say it's not going to change how current software operates?

Does your current software ask you to answer, "I think this is the same person because...?"
GeneJ 2011-05-05T20:12:04-07:00
@ Testuser:

I'm not sure why feel I'm afraid of it?

You wrote, "What software .... " I assume you mean as far as genealogy software is concerned. I use TMG and have used GenBox.
You wrote, "Are there things that happen on paper only (no software)?" No.
You wrote, "How do you get to the final essays and reports?"
Maybe you more specifics? Relative to the record based work you are doing--I enter a lot of evidence, but don't enter as many tags as I used to.

Along the lines of the comment on the page ... We stand at the same place in history and see a different future. You write, "you could keep working the same way ..." Hope you can appreciate how that comes across to the genealogist in me. Sort of like, "Hey, we're laying the groundwork for the next generation of software, and the good news is, we hope you won't lose anything."
AdrianB38 2011-05-06T13:38:13-07:00
"explain where all the evidence (including conflicting and negative evidence) is actually stored and how it is accessed."

It can be stored in exactly the same place as where you store your evidence now. It may offer you the chance to break down your evidence further, but you don't have to take that opportunity. Not sure what else I can say given that none of us write the software.

Automatic Data Collection might very well put evidence elsewhere - because it's automatic. But ADC is not E&CM.

E&CM wants to divide stuff into more detail - but not to break apart stuff so that connections can no longer be made. It's a legitimate question whether the advances are quite as smart as we would hope for (e.g. the question of how much interpretation goes into a persona) - but none of the current practices are removed.

It's also quite legitimate to wonder if the response time is slower if you take "advantage" of the new connections.

"is E&C able to produce biographies, genealogies and family group sheets that report evidence fully and faithfully?"
E&CM is not a methodology, it is not an application. It does not even dictate a methodology or application. Therefore it cannot produce biographies, genealogies and family group sheets. >>>>> But GEDCOM cannot do these things either. <<<<<<<<<

"Under one master, BetterGEDCOM wants to support research activities by accessing the indexes of large record providers"
What does this mean? What 'master'? We both of us agree that sucking in indices without assessing their quality is a road to ruin. I see no reason why E&CM does not support Genealogical Proof Standard.

"Am I not correct that BetterGEDCOM could support such activity [souring such indices] with "best evidence" findings and reference notes intact?" Of course you are correct. The point is that no-one has suggested dismantling "best evidence" findings and reference notes.

"The second master seems that of 'Automatic Combining' ..." As we have said, ADC is not E&CM.

"In order for BetterGEDCOM to have the automatic combination of genealogical records, it seems to want to write off my genealogy" Given that neither BG nor E&CM want to mandate auto combination, your concern is misplaced.
GeneJ 2011-05-08T13:39:17-07:00
Hi Adrian
Thanks for your additional comments.

I had asked for a more current overview of E&C before posting the page. I was directed back to the wiki to read all of Tom's posts. I worked with Tom's original overview and my understanding of Tom's most recent thinking.

--Didn't want to disrupt process of getting current logic/E&C documented.

I appreciate the logic in Tom's posting E&C/ACD (Automatic Combining ...). The "E" in "E&C" is prepackaged machine readable index data (created by third parties). The 'puter is going to work to infer relationships from the four corners of that "evidence" on a grand scale ... etc., etc. I still would have captured the full source, including extract/abstract in the reference note, but (to me) that seems less about E&C and more about "efficient" data. As I wrote on the page, there are side effects from trying to commingle Tom's original logic with user-driven genealogical processes and real-world historical documents/evidence.

While this part of the topic is out of date now, in the earlier banter with Tom, I don't think I had misunderstood his view of limiting the reference note to a source locator.

A little off topic ... I know some users now don't abstract or extract bits into their working file reference notes. When they find another item of evidence, they either enter another source to the same tag (again without the abstract/extract) or as a source to a new tag. Over time, because they haven't kept that "snippet" in the reference note, it's pretty easy to lose track of the different information from the different sources.
AdrianB38 2011-05-09T05:11:10-07:00
"I know some users now don't abstract or extract bits into their working file reference notes. When they find another item of evidence, they either enter another source to the same tag (again without the abstract/extract) or as a source to a new tag. Over time, because they haven't kept that "snippet" in the reference note, it's pretty easy to lose track of the different information from the different sources."

Gene - exactly. Been there, done that, got the T-shirt of confusion.

And that is precisely one of the issues that the E&C Model attempts to deal with.

See http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/38620132

It actually doesn't quite mention that aspect but I realise now that is a side effect of the requirement. Or is it a side-effect of the solution?

Either way, it would be possible for the user to drill down from the list of sources cited against a PFACT or Event. Because the record of previous values is permanent, at each level down you see the previous values and the previous cited sources. At some point in your descent you see the bits change and it should show therefore which cited source triggered the change.

It's probably not simple to design a decent GUI for, but the E&C data model enables it.

I should probably update the proposed requirement to add this as a possible reason for doing it, so thanks for that contribution to the model.
GeneJ 2011-05-09T09:04:59-07:00
I took a look at "Reqts for E&C 2 - Permanent Record of Evidence and Conclusions" (link in your message). You also wrote, "at each level down you see the previous values and the previous cited sources."

"All the evidence and the best evidence" doesn’t seem the same as drilling down to the previous values.

I think it's more about how one arrives as the "primary"/"preferred" value (and wanting that value to report all the evidence and the best evidence).

AdrianB38 2011-05-09T09:25:03-07:00
"it's more about how one arrives as the "primary"/"preferred" value"

Well - possibly, but I was answering "it's pretty easy to lose track of the different information from the different sources" and drilling down to previous values WITH their citations seems to give one the ability to track WHERE a bit of information comes from.

So, when one gets lost because something's not been written down, E&C Model can help with WHERE something comes from, but it can never help with WHY it's the preferred value. Seems to me that something is better than nothing.
GeneJ 2011-05-27T07:58:35-07:00
Link to blog entry about a vtial record containing a strikethrough
This week I blogged about a New Hampshire vital record about my ancestor, the wife of Maj. William Preston, married 1779 at New England.

See, "Beyond the Little Green Leaf: Proving the maiden name of Elizabeth (Clark) Preston (1760-1807)"

Unlike the places to which my ancestors from New England migrated (see "Ancestors for sale ... cheap ...,"), early vital record collections exist for many towns in New England. The blog posting above provides an example of one such vital record.

In this case, the vital record contains a strikethrough, and an alternative name has been written faintly below. From the information available, I can only guess about the who-what-why behind the alteration. The blog posting goes on to explain how I researched "the known" and indeed learned what became my "best" evidence about this woman's name.

Hope this helps. --GJ
GeneJ 2011-07-07T11:57:31-07:00
Combining evidence personas versus conclusions
Ref: http://bettergedcom.wikispaces.com/message/view/home/40495335#40528573

The genealogy software I use is referred to as "conclusion based." At it's heart (my word) is the "conclusion person"--I establish his/her key identity with assertions taking the form of pfacts for name, birth, death and marriage(s), and I define his/her relationship to other family members and associated persons (other "conclusion persons). I record other information about these conclusion persons--and I create reference notes and separate bibliographic records.

Evidence Personas ... I assume we all know our sources (and record groups) are often incomplete or flawed, at least for our purpose. As well, our interpretation of a source may be flawed as to some of the details we seek to record. All told, I assume we know that in a body of work, various sources will contain information that conflicts with information, as we interpret same, from some other source--which may or may not have been discovered in a timely or convenient manner.

Combining Evidence Personas ... As I understand the description of the your process, Tom's "conclusion person" is formed by collecting documents that refer to generally same- or similarly-named persons and grouping the detailed data therein with the logic, "I think this is the same person because ..." Even if all records could be readily/simplistically codified (a separate issue) and if no genealogist made transcription, translation or typographical errors (we just know they don't), the process of grouping various records under one person with that "same person" logic only accomplishes a compilation of record data. Ala, a combination of flawed and conflicting records (whether we realize it at the time or not) forming a larger group of oft' duplicative, flawed and conflicting data--dramatically, a frankenstein.

I see a gap between that "combined" record and the "conclusion person" record in modern software. In the combined record, no doubt even when every record truly is about the same person, very different identities will become conflated and it might include enough parents or children to form a small community.

During the National Genealogical Society 2011 conference, Tom Jones quoted Helen Leary, saying "Conflicting evidence is incompatible with a conclusion." http://bit.ly/qwRFFX

The research process involves the steps users undertake to resolve the frankenstein. As an example, it might take years for a family historian to study laws and court practices in order to conclude what "reached the age of majority" meant for a given locale in 1684, or the full meaning of appearing before a fornication court in 1728 ... the understanding, in context, of an ordinary marriage intention. Even when there is not a substantive conflict, how many genealogists want 20 birth tags to be part of their conclusion person, each with its own series of what I think are being called "note" fields and having it's own entry for "I think this is the same person because..."

It seems to me we have spent a great deal of time talking about evidence personas and the process of utilizing them to form conclusion persons. It's admittedly a bit of a downer if we don't have some consensus about what it takes to create a conclusion person's record.

I'm hoping we leave the issue unresolved, move to support Adrian's work on the research diagram and then move on to other topics. Perhaps as we get a grasp on more pieces of the puzzle we can revisit this topic with more success.

(1) You wrote, "Could you explain what is the sweet spot in genealogical software?"
See http://bettergedcom.wikispaces.com/message/view/home/40495335?o=20#40688891

(2) You wrote ... "(a) Can higher level logic be placed into notes that are attached to normal data records? (b) Or does the BG model need special records for recording and holding higher level logic? (c) What is higher level logic?"

I'm going to side step "normal data records" and "special records for recording..." and even "higher logic." Without more agreement about what it takes to form a conclusion person, this seems not a productive dialog.

(3) I don't see the point about data, information and knowledge. The historical and scientific methods are based on evidence and conclusions. It seems to me that evidence and conclusions are the data, information and knowledge you are talking about. I believe we have covered these concepts from the beginning. If you believe the models do not handle data, information or knowledge, could you explain why and how you would change them so they do?

As above (2)

(4) Maybe if you could describe the sweet spot, we could try to understand how the models fail to support it.
See http://bettergedcom.wikispaces.com/message/view/home/40495335?o=20#40688891

(9) The compilation person hasn't been included in any model so is a new and undefined concept. Are you implying there is something weak about the conclusion person concept as we discuss it? Would it be possible to explain?

See my introductory comments, above.

(10) How is ... "trying to dumb it down" and what is the "it" that ... is dumbing down? I see the ... models as trying to encompass a fuller process than GEDCOM. Everything we've done so far is a "smarting it up" as far as I see it. It would be helpful if you could try to explain what you mean''

See the introductory comments.
AdrianB38 2011-07-07T13:05:53-07:00
I think the issue is that you have a different idea of how stuff is combined to produce a conclusion person than I do.

You say "the process of grouping various records under one person with that 'same person' logic only accomplishes a compilation of record data". No - Geir, I and others spent some time on working out how stuff would be combined - or rather, could be combined because we're doing the data model (i.e. the basics for the file structure) not the application. In the classic conclusion based person, a fair amount of the conclusions are a straight compilation - if I have 1851, 1861, 1871 and 1881 census entries for someone, the census events are just compiled onto the conclusion person, with no further thought. (Let's leave aside those who don't record a census event but create occupation, residence, etc attributes). However, the same 4 census entries can be analysed to come to a single conclusion about their birth details. (Again, let's skip over those who would record 4 birth events). I imagine that you, like me, would do that analysis and come to a conclusion about the birth event - that possibly alters with the next piece of evidence.

Using the evidence/conclusion person model, we would do exactly the same thing.

Suppose for simplicity that I have 4 personas, 1 for each census, because I've never tried to analyse and combine anything yet. If the 4 birthplaces are (respectively) Davenham, Wharton, Wharton, Winsford, then my analysis would conclude that the probable birthplace is Wharton (trust me - that's not because it's the most common, it's because each is contained within the other). In my current, conclusion only person, I would conclude the birthplace is Wharton with appropriate "citations" and explanations. In my theoretical evidence/conclusion person model software, I would conclude that the birthplace is Wharton, enter a new event at the level of the (combined) conclusion person containing "PLACE=Wharton, Cheshire, England" with some justification in my "citation", AND the software would link this to the birth events on the lower level personas and suppress their values.

Thus, the evidence/conclusion person model software also has a single, analysed conclusion with the (apparent) conflicts explained away. So we absolutely do NOT have a thoughtless group of data creating a Frankenstein.

I must stress, again, that the idea of automatic combination (which would lead to the Frankenstein) is not an integral part of the evidence/conclusion person model. No-one has ever suggested that automatic combination is all that's necessary.

The benefits of retaining those personas are that when we sit down and wonder - did I get that conclusion right? - we can go back and examine the previous stage. (If the previous stage is just a persona then we've not gained much because the [persona is just a reflection of the source, so we could equally look at the sources - no, it's where we combine the John Doe from Davenham and the John Doe from Northwich and then wonder - where did _this_ bit come from? That's when the ability to look back at the evidence people that went into the argument is important.

So because the analysis IS done and the conflicting evidence suppressed - but not destroyed - then I do NOT see a gap between the "combined" record and the "conclusion person" record.

You ask "Even when there is not a substantive conflict, how many genealogists want 20 birth tags to be part of their conclusion person". They won't be. If the 20 tags are identical (fat chance!), then the software will suppress the multiple versions and show just one in the reports and on the screen - though with 20 "citations". If the 20 are all different, then, if the users are doing their job of analysis, they will have created over-riding birth tags at an appropriate point so only 1 is seen.

There's a mass of undecided stuff here that is appropriate to the application - what stuff is supposed to be combined? (e.g. birth tags), what added (e.g. occupations at different dates), what merged (e.g. occupations at different dates - oh, interesting...) But none of that alters the data model. We can reach all sorts of "consensus about what it takes to create a conclusion person's record" but the application software guys will only listen to what we say about the data model.

From your explanation of the "sweet spot" - the evidence/conclusion person model will _allow_ EXACTLY your sweet spot. I cannot guarantee any application software _will_ give you that sweet spot any more than the LDS creators of GEDCOM could guarantee it.
GeneJ 2011-07-07T14:02:57-07:00
@Adrian ..
You wrote, "Geir, I and others spent some time on working out how stuff would be combined..."

Yes!! ...And I thought we were headed to consensus.

But there we had discussions here

And here

My comments above are more about Tom's comment, "The E&C process has been defined, with no gaps or bridges, since I wrote up my first descriptions of how the DeadEnds model can implement the E&C process. That was November or December."

Hope this clarifies. --GJ
AdrianB38 2011-07-08T05:24:37-07:00
OK - the 2nd of those 2 links doesn't, in my view, impinge at all on the evidence and conclusion model. It simply points out that there is a whole new ball game out there that we haven't touched, namely - how can we concoct reasonably readable text from a string of GEDCOM facts without cheating by writing everything in a Biography tag?

Not looked at the other link yet...
GeneJ 2011-07-08T13:33:49-07:00

The second link is an outgrowth of the first. If it isn't, then I should have added the link to a 'tweener posting.

You wrote, "... whole new ball game out there that we haven't touched, namely - how can we concoct reasonably readable text from a string of GEDCOM facts without cheating by writing everything in a Biography tag?"

Humm ...

There have been discussions about Tom's posting in the last two Developers Meetings. See also my posting here:


I need a little help here. I read several times the 2 June 2011 posting (to which the above link responds).

The "conclusion person" in modern software IS the basis (already) of meaningful genealogical output. From that conclusion person record, most modern desktop software produces family group sheets, charts, outline descendant list and narratives.

Heck, upload a GEDCOM to WorldConnect, and it will generate it's form of individual, descendancy, "Register" (biography), pedigree or Ahnetafel output for you.

WorldConnect's form of Register biography

Why would BetterGEDCOM work to concoct "reasonably readable text from a string of GEDCOM facts"

BetterGEDCOM sets out standard tags and roles, and provides definitions about same to vendors. See Data-Event03, Data-Char03.

See also http://bettergedcom.wikispaces.com/message/view/Individual+Data+Elements+Discussions/30295247#30306211

I know you and I discussed narratives from the often canned electronics in software, so I'm wondering why you, too, might think we should look at "how can we concoct reasonably readable text ..."

Separately, citation requirements, as we have been discussing them, don't arise because of a biography. (Why would there be a different requirement for a conclusion person entry in a database vs a family group sheet vs a biography).

Citation requirement are far more related to taming the Frankenstein/beast and the nature of privately held or difficult to interpret sources. For example ... take poor little Hannah. If there had been no Vermont Families publication reversing her identity with that of her aunt, then we'd have had only her NHVRs for birth and death to report about her, right? --GJ
AdrianB38 2011-07-09T10:21:58-07:00
Gene asks: "I'm wondering why you, too, might think we should look at "how can we concoct reasonably readable text ..." "

I'm not saying we _should_ look at this ball-game. I'm simply reflecting that there is a ball-game out there that we might want to deal with. Personally, I think the creation of what I might call a desk-top-publishing element in BG is a step too far. Apart from anything I have no real idea how such software might work and therefore no real idea about how the data might be structured.

Yes, the "conclusion person" in modern software can be the basis (already) of meaningful genealogical output - if somewhat robotic in its sentence structure.

"citation requirements, as we have been discussing them, don't arise because of a biography" - agreed. The only reason I think of them in terms of the generating reasonable text bit, comes with the multi-source reference note, where several "citations" (sorry) justify a single event (say). My software would produce several separate reference notes in the reports, each referring to one source, whereas arguably a single reference note referring to several sources is the better practice and seems to be one that you adopt in some fashion that I don't understand. In my software I'd somehow need to tell the system how to merge those reference notes. That's the only reason I'd think of extending into the DTP type way of putting extra definitions into BG.
ttwetmore 2011-07-11T07:03:49-07:00
I have written a lot about the process of using the concept of a persona record to represent codified evidence, and how to link those personas into conclusion person records. The fact is that this is an exact analogue of what genealogists do in their heads or on paper, it is an exact analogue what researchers and historians do, as they search for and evaluate evidence, in whatever form it may exist in the real world, and then use that evidence to make their conclusions, in whatever form they may wish to express them. Using persona and person records simply models this process with easy to derive and easy to understand computer records, customized for the genealogical process, making it possible for computer programs to help support the genealogical research process. My ideas are not new or break any ground. Anyone who has done any work at all studying the problems of doing large-scale genealogical research based on historical records, know there is an entire science and research area behind these ideas. I have tried to point out this corpus of work as we've gone along. At this point I've said more than enough. It is good that most active members of BG understand the ideas involved here, and can give more lucid explanations in their own words.
theKiwi 2011-07-11T08:20:35-07:00
Tom wrote:

>I have written a lot about the process of using the concept of a persona record to represent codified evidence, and how to link those personas into conclusion person records. The fact is that this is an exact analogue of what genealogists do in their heads or on paper,

Yes I guess this sums up how I work. My work and my processes have always been constrained by what I can achieve with Reunion for Macintosh.


Where it is "obvious" from the start that the 2 people represented in 2 different records are in fact the same person, I enter them in Reunion as a single person, with if necessary multiple events to cover these differences- eg 2 birth events, each with a different source.

Where it is not obvious that they're the same person, I create 2 individual records to hold the relevant information from each source. Sometimes I'll add a note in each referencing the other as maybe being the same person - eg http://roger.lisaandroger.com/getperson.php?personID=I16256&tree=Roger on my TNG site gives an example of how I've done this. (The Reunion note contains the HTML code needed to make the link appear so isn't exactly "print friendly" straight out of Reunion).

If I can later prove that they're the same person, then I merge those individuals into a single record keeping all the separate details in separate fields.

So here, apart from the notes, and presence maybe of multiple event records for someone, there is no difference between my conclusion person and the "not yet concluded" personas - they both have a unique RIN/ID number in my file, they both show in the index, etc.

While I see the theory and benefits of the persona records and conclusion person records, I'm quite unsure of whether the various genealogy vendors might choose to somehow formally implement this rather than just leave users doing as I've noted I do above, and if they do implement this, how they would go about it - most likely all using their own ideas.

AdrianB38 2011-07-11T09:07:25-07:00
"I'm quite unsure of whether the various genealogy vendors might choose to somehow formally implement this"

Well, so far as I can see from the outside, FamilySearch have personas and link them - non-destructively - to form a conclusion person, using the evidence from the personas. But then, having done it at one level, that's it, you're back into the destructive merge from then on, which means you can't say "If Richard at Davenham is not the same as Richard at Northwich, what did Richard at Davenham look like before I merged him with Richard at Northwich? Oops - just like this persona here that I can't quite fit into the merged Richard."

But you're right - creating the data model for this is quite easy - the practicalities of the logic and making the user interface show the current combined state as if it were one person when it's necessary and splitting them apart when it's necessary - tends to blow my mind because it's not software I've dealt with.
ttwetmore 2011-07-11T14:39:25-07:00

I use LifeLines on my Mac and have to use the same procedures you do, as there is no support for persona INDI's versus conclusion INDI's. Lately, however, I have gotten into the habit of entering all my evidence personas as separate INDI's. I really don't like the idea of merging these personas by "flattening" them into a single final INDI, as all reversibility is gone. What I'm really doing is waiting for myself to either modify LifeLines to handle trees of INDI's as I've proposed in the DeadEnds model, or to implement DeadEnds itself. I need to do something relatively soon as my LifeLines database is filling up with personas and no good way to handle them.

I daresay lots of people use their systems just the way we do. They have to because it's all the facility they have. New Family Search has a nice implementation of personas and persons, and if some other commercial vendor would add these to their offerings then you and I and many others could work in a much more natural way.

Note that there is never any merging with personas. Personas are simply grouped together and become a higher level person record. This higher level person record can be used to show conclusions, disambiguate conflicting data, resolve missing information, record your research notes, and so on. Though this higher level person has recently been called a Frankenstein, it is clearly the best way to implement genealogical E&C, and I hope BG will embrace it.
ttwetmore 2011-07-11T14:42:16-07:00

The user interface for personas grouped into persons is not that hard to imagine. In some contexts you want to see just the top level person, and in other contexts you want to see what's inside. There are lots of ways of doing that. I've been experimenting with showing them using what's called an outline table on Mac OS X (specifically the NSOutlineTable class). This is a user interface that lets you drill down as far as you like into tree structures or to pop back out. It works quite nicely. There are other metaphors that work well as well.