BetterGEDCOM support for Evidence Explained and the Genealogical Proof Standard

How do we transfer the source structure, citations and proof.

Excel spreadsheet: Zotero field names (elements) external image vnd.ms-excel.png Zotero Fields_alpha_97-04v.xls

An architecture for sources, reference notes and bibliographies.


We want to round the wagons so those who use //Evidence Explained// and are guided by the Genealogical Proof Standard (GPS) have confidence BetterGEDCOM has worked to identify, understand and incorporate related data requirements. We focus first on Sources and Reference Notes--the record of our Evidence. Some work related to the GPS and //Evidence Analysis Research Process Map// [link to Best Practices] has been done and more will become incorporated.

Whether you are a tech from start to finish or couldn't write a line of code if your life depended on it, there is a place for you in this particular BetterGEDCOM discussion. Please join in the dialog as we work to develop information, assess needs and developed specifics for inclusion in the BetterGEDCOM Requirements Catalog.

Sources and Reference Notes

What is the problem?

For many of us, Citations represent the record of Evidence for our conclusions; the Genealogical Proof Standard deals with evidence and conclusions and has an element, "Complete and accurate citation of sources." GEDCOM does not transfer Citation data well, so it does not transfer the record of our Evidence well. Post transfer, producing a simple Family Group Sheet can be a major undertaking.

Separately, as our Application Overview indicates, trends favor enabling interpretations of Modern Style Guides, especially those unique to genealogy. Technology exists or is emerging that allows users of other styles to capture essential citation information online. BetterGEDCOM seeks to extend the functionality of it's standard in support of these existing and emerging technologies.

Why don't citations transfer using GEDCOM?

The GEDCOM source system is based a small group of Citation Element-like Data Fields (AUTH, TITL, PUBL, etc.). Many Applications have enabled a substantially larger groups of Citation Elements that further relate to Citation Templates. [See Application Source Systems] In order to export comparatively larger and/or more complex data to GEDCOM, Applications group Citation Elements together and associate those groups with GEDCOM's Data Fields or even custom tags. [See Application GEDCOM Export] A receiving program has no assured way to interpret the details of the data on import; users must manually reconstruct parts or all of the citations.

Related Wiki Pages: Sources and Citations in GEDCOM, Application Overview, Application Data, Application Pages
Additonal References: //ThinkGenealogy, "//Better Online Citations .... (GEDCOM)"; Terry's TMG Tips, "Considerations for Exporting Sources to GEDCOM 5.5 Files; , "Do Genealogy Template Sources Survive ... GEDCOM; Russ Worthington's, BetterGEDCOM "Data Tests"

How do we fix the transfer problem?

When fully implemented, BetterGEDCOM Applications will export all the data necessary for another BetterGEDCOM Application to reconstruct citations from both standard and non-standard Citation Elements and default, standard and non-standard Citation Templates. We have preliminarily identified six (6) requirements for this purpose. [Link to page - Fix the Transfer Problem]

1. Establish an extended group of standard BetterGEDCOM Citation Elements, and
2. Determine the method by which an Application will define and export non-standard Citation Elements, and
3. Discover one or more standardized BetterGEDCOM Citation Templates, where at least one "default" template is based on GEDCOM's Data Fields, and
4. Determine the method by which Applications will export references to that/those standardized BetterGEDCOM Citation Templates, and
5. Discover ways by which Applications will define and export (or export references to) non-standard Citation Templates, and
6. Determine methods by which this comparatively larger export will be consistently interpreted by those Applications that are GEDCOM compatible but have not yet adopted BetterGEDCOM's model.

Somewhat in the way GENTECH envisioned active management of it's data model, [1] BetterGEDCOM envisions active management of standard Citation Elements and Citation Templates. We have preliminarily identified one (1) requirement for this purpose:

7. Provide means for active management of BetterGEDCOM standard Citation Elements, Citation Templates and the mechanisms by which references are transmitted.

[1] GENTECH Genealogical Data Model (2000), p. 3, "the model will be extended."

Related Wiki Pages: Fix the Transfer Problem
Additional References: Rootstech (Wikispaces), "Sources"

Online Citations & More!

Various technologies exist or are emerging that support the capture of essential online citation information. BetterGEDCOM intends to use it's standard Citation Element and Citation Template schemes and other technologies to advance the capture of online citations. Perhaps one day, visitors to genealogy centric sites will transfer essential citation information to a user's application via a BetterGEDCOM UUID?

Please join in the BetterGEDCOM discussions about Online Citations & More! for ideas to extend the functionality. Help us determine the related requirements.

Related Wiki Pages: Zotero (open source) [wip], Blog, A few Zotero screen shots, former "thing" discussion
Additional References: Wikipedia, "Zotero," Zotero.org,Zotero-Evidence Explained thread, Endnote, ThinkGenealogy, Better Online Citations (series)

Summary of Citation Style Language 1.0 (pdf) specification used by Zotero and others.

From repository meta data to BetterGEDCOM and reports.pdf - A future scenario

Sources and Reference Notes - Terminology

Many definitions are available on the wiki page, "Supplemental Glossary from _Evidence Explained_, 2007"
See also Glossary of Terms and Definitions under Development.

The terms below are "Definitions under Development."

Citation Element - An identifiable field in a citation that is associated with particular data. [Link to Discussion]
Citation Template - specifies the Citation Elements, arrangement of same and punctuation necessary to form a citation. [Link to Discussion]
Customized Citation - Changes made to a citation other than to edit data in a Citation Element or Citation Template. [Link to Discussion]
Default Citation Template - A BetterGEDCOM Citation Template based on GEDCOM Data Fields.
Master Source - in application software, sometimes "source," a group of citation elements equivalent to GEDCOM's SOURCE_RECORD. Has a one to many relationship with citation specific elements
Citation specific elements (aka Assertion-level elements) - in application software, those elements added to a master source to form a reference note.


GeneJ 2011-03-16T10:15:19-07:00
Terry's TMG Tips, "Considerations for Exporting Sources to GEDCOM 5.5 Files."

From the article, "The export is complicated, however, by the fact that TMG has far more Source Elements, or even Source Element Groups, than there are different tag types for sources in the GEDCOM specification. TMG deals with this by doubling up the information from several source elements into a single GEDCOM tag. But since there are no accepted standards for this "doubling up" process, various programs importing a GEDCOM file containing such combined tags will meet with varying degrees of success in satisfactorily placing the data in their own systems of recording source data. It is unlikely that the resulting source notes will read much at all like the source notes produced by TMG, even if the importing program successfully retains all the data included in the GEDCOM file."
AdrianB38 2011-03-16T14:42:27-07:00
Now _that's_ useful - it indicates the sort of data that TMG wants to capture - and shows the shortfalls of GEDCOM.
GeneJ 2011-03-17T01:27:57-07:00
While Legacy, RootsMagic, Family Tree Maker and others approach the development of the templates a little differently than TMG, the same concept of "doubling up" seems being used to transfer information to GEDCOM.

As Terry points out, at the citation level (reference note), there are essentially two elements recognized by GEDCOM--PAGE and NOTE. The balance of the reference note is transferred as the source definition, as below.

At the source level, Terry reports about five more:


When a repository is reported, there are a few couple more elements.

Adrian wrote, "it indicates the sort of data that TMG wants to capture..."

Actually, TMG comes pre-packaged with even more citation/source data fields* and users can also add more. In order to fit the information into GEDCOM, however, each data field in TMG must be assigned to one of the Source Element Groups Terry has listed in his article.

*For example, TMG doesn't have a Source Element Group for the WWW components of an online source; you also don't see a Source Element Group for "Source of the Source."
GeneJ 2011-03-16T11:30:17-07:00
Gena-Musings, "Do Genealogy Template Sources Survive When Passed Through GEDCOM?"
As only Randy Seaver can say it and show it:


"The conclusion I've found - at least with these three programs [FTM2011, RootsMagic, Legacy Family Tree 7] - is that ... Using a GEDCOM file to export data from one program and import it into another program will result in the loss of all Source Template information. The resulting source templates are "Free-form" after the program interprets the GEDCOM data that is provided. In the process, some programs do not put information in the correct Free-form source template field."

He has posted many, many more valuable tests and comments on his blog.
AdrianB38 2011-03-16T14:34:32-07:00
Did he at any point ever show what was in the GEDCOM files?

If he did, it might help to identify what that particular template could put into a "real" GEDCOM tag (if that's the right way of phrasing it), and what it couldn't and had to put ... somewhere.
GeneJ 2011-03-16T19:24:28-07:00

See Randy's more comprehensive article, here:

GeneJ 2011-03-16T21:01:30-07:00
To Randy's post (http://www.geneamusings.com/2011/02/software-programs-gedcom-files-and.html ), Louis writes, "3. Italics is another example of formatting. It is not data. This is the same as number 2. The receiving program should do your formatting."

Louis, how is the receiving program to know if the data in TITL is (a) the published title of a work, (b) the quoted title of an unpublished work or (c) a generic title you have chosen for the item?

Louis also writes, "1. Putting all info into one field is wrong. Reading programs will not be able to interpret what is there. The various parts of the source citation must be identified, and placing them in the various fields: Author, Title, Publication, etc. is the way that reading programs can understand what is there."

I agree, but where are the fields in GEDCOM? For example, were is my field for website owner, website page name, URL, access date and keyword? Where are my title fields for the entry to cite a distinctly authored and titled table in a particularly titled and edited volume of a specifically titled series that was accessed at a particularly titled Internet page?

Where should I enter my source of the source, which may or may not include the title of a published work?
AdrianB38 2011-03-18T14:05:46-07:00
Gene - when you ask, "where are the fields in GEDCOM?" - they aren't there. As I'm sure you realise. When Louis says "Putting all info into one field is wrong", I agree with him but what I mean (and I imagine Louis means also) is that, if we could start from fresh, it's wrong. As it is, with GEDCOM, it's a case of choosing the least worst option.

What we need to do with BetterGEDCOM is identify the elements to accommodate all those things that you mention. What's interesting is that we may find that these elements are as volatile as the list of events and just like we need to be able to react quickly to add extra events, so maybe we need to react quickly to add extra "citation / bibliography" elements as new types of document appear????

I don't know on this volatility - just a personal thought.

Incidentally - re your question about italics. This is really something that gets me going. It's plain and simple a fault in using Chicago style for non-experts. It's absurd to expect the general population to understand that italics means X (and doesn't the page numbering change from comma to semi-colon or was that just a nightmare I had?) To me, the title of a published or unpublished work is a title and whether or not it's published should be indicated by the presence of publishing data. I note you have a 3rd option though, which is interesting - not sure how I'd like to code that yet.

And your source of the source question is a good one - I'm not sure I've seen a full EE example of that sort of thing... Thank goodness ESM says citing is an art!
GeneJ 2011-03-18T22:03:50-07:00
Hi Adrian:

I thought Louis sees existing GEDCOM sourcing is sufficient. I may be mistaken.


Your post of 13 December 2010 (http://bettergedcom.wikispaces.com/message/view/Shortcomings+of+GEDCOM/31633551 ) is on point. While I don't doubt you might want to refashion the elements. Separately though, unless I somehow tell you, I haven't figured out how you know the template I presumed in developing the elements you receive from me.

Re, "elements to accommodate..." I hope when I finish the page contents and those WIP "element" spreadsheets it will be easier to describe. I'll save some aspirin for those element worksheets. --GJ

P.S. You write, "... about italics ... a fault in using Chicago ..."
I send you aspirin and hope we agree-it's not the job of BetterGEDCOM to be the citation police. :)
GeneJ 2011-03-18T22:36:16-07:00
Source of the source examples from Board for Certification of Genealogist website, "Work Sample Products."

The two examples below from Carmen Finley, Ph.D., CG, "Who Was Aunt Mary? A Brief Case Study in Identification and Kinship 'Correction.'" (Noted as previously unpublished.)

1880 U.S. census, Athens County, Ohio, population schedule, village of Athens, subdistrict 4, p. 26 (stamped folio 68-B), dwelling 2, family 271; digital image, _Ancestry.com_ (http://www.ancestry.com : accessed 22 February 2007); citing National Archives microfilm publication T9, roll 992.

Jane Shute, contributor, “Athens County, Ohio - Daniel Bertine Stewart, bios, 1883,” _USGenWeb Archives_ (ftp.rootsweb.com/pub/usgenweb/oh/athens/bios/stewart.txt : accessed 22 February 2007), citing _History of Hocking Valley Ohio_ ([Chicago: Interstate Publishing Co.], 1883), 1389–91.

One or two examples from Elizabeth Shown Mills, CG, CGL, FASG, “Which Marie Louise is ‘Mariotte’? Sorting Slaves of Common Name,” _National Genealogical Society Quarterly_ 94 (September 2006): 183-204.

The 1974 dissertation was published unchanged as Gary B. Mills, _The Forgotten People: Cane River’s Creoles of Color_ (Baton Rouge: Louisiana State University Press, 1976), 34–35, citing “Pierre Dolet to Marie Thérèze Coincoin (MS in Cammie G. Henry Collection ... Old Natchitoches Data, II, 289).”
GeneJ 2011-03-16T11:43:10-07:00
Genabloggers, "Interview – Elizabeth Shown Mills"

Here's Elizabeth's response to the question, "What is important to remember about citing online sources vs. more traditional ones such as documents, newspapers, books, etc.?"

Mills replies,"The most important point to remember is the sameness, not the difference. Whether we are citing a source from a physical place or a virtual place, we need to record two types of data:

* Details that enable us and others to understand the nature of the source and gauge its reliability.

* Details that enable us and others to find the source again.

Beyond that, the fact that our source is online, rather than on paper or other material, does not change its basic nature. A deed book is a deed book. A tombstone is a tombstone. An article is an article. A published abstract is a published abstract. Therefore, the basic citation remains the same."

"The main difference created by digital publication of a source is that we may have more “layers” to cite. ..."

Psst. Thomas also asked for advice to those didn't record sources with they first jumped into the sandbox, and now have "regrets." He asks, "How does one start the process of adding source citations to years’ worth of genealogy data?"

Mills responds, "Ah, Dear Thomas! It’s not just many of us who have this problem, it’s virtually everybody! Think of all the poor souls who started genealogy 25 years ago, using software that had no citation capability at all. ..."

Perhaps most importantly, to the question of technology and sources, Mills comments, "Five years, Thomas, is a blip. Let’s look at thirty.

Technology came into genealogy as a tool, but it quickly became a teacher. The dominant teacher. Folks of the 80s and early 90s, who set out on a family-tree-climbing adventure, bought software and followed its instructions. If Program XYZ carried no data-entry screens for use in identifying sources, then users ‘learned’ that sources did not matter. ... The 90s saw great advances on The Evidence Front. By the end of that decade, virtually every software program ‘taught’ its users that genealogy was not just about looking up names and dates ... The 2000s then brought us The Data Explosion. We now have more online providers of historical materials than any of us will ever live long enough to explore. .... It is wondrous to see the robustness of the technology that makes this possible. But it’s frustrating to see the extent to which technology’s teaching role has regressed in this new forum ..."

It's a great read.
GeneJ 2011-03-20T13:46:04-07:00
Wiki PAGE feedback and suggestions
Hoping the page takes a project approach to developing BetterGEDCOM source and reference note input to the Requirements catalog.

Your feedback, comments and suggestions about the content and outline are appreciated. --GJ