HOME > EE & GPS Support > Fix the Transfer Problem

High level considerations about how to create a solution for citations in BG

Geir's high level considerations on citations in BetterGEDCOM.pdf for discussion.

No more scrambled citations!

Citations represent the record of Evidence for our conclusions; the Genealogical Proof Standard deals with evidence and conclusions and has an element, "Complete and accurate citation of sources."

BetterGEDCOM will provide for the "complete" and "faithful" transfer of citations.

When fully implemented, BetterGEDCOM Applications will export all the data necessary for another BetterGEDCOM Application to reconstruct citations from both standard and non-standard Citation Elements and default, standard and non-standard Citation Templates. We have preliminarily identified six (6) requirements for this purpose.

1. Establish an extended group of standard BetterGEDCOM Citation Elements, and
2. Determine the method by which an Application will define and export non-standard Citation Elements, and
3. Discover one or more standardized BetterGEDCOM Citation Templates, where at least one "default" template is based on GEDCOM's Data Fields, and
4. Determine the method by which Applications will export references to that/those standardized BetterGEDCOM Citation Templates, and
5. Discover ways by which Applications will define and export (or export references to) non-standard Citation Templates, and
6. Determine methods by which this comparatively larger export will be consistently interpreted by those Applications that are GEDCOM compatible but have not yet adopted BetterGEDCOM's model. [Link to discussion]

Somewhat in the way GENTECH envisioned active management of it's data model, [*] BetterGEDCOM envisions active management of standard Citation Elements and Citation Templates. We have preliminarily identified one (1) requirement for this purpose:

7. Provide means for active management of BetterGEDCOM standard Citation Elements, Citation Templates and the mechanisms by which references are transmitted.

*GENTECH Genealogical Data Model (2000), p. 3, "the model will be extended."


GeneJ 2011-04-08T20:45:43-07:00
#6 - Determine methods by which this comparatively larger export will be consistently interpreted by those Applications that are GEDCOM compatible but have not yet adopted BetterGEDCOM's model.
How about some ideas?

One thought was to have a requirement for BetterGEDCOM to have a "default" template based on GEDCOM's current Data Fields. (Not all currently do.)

That would at least provide an internal "map to" mechanism in each BetterGEDCOM compliant application.

Needs much work, but perhaps this is a start.
AdrianB38 2011-04-09T12:38:39-07:00
Trying to respond to your request for ideas.... I'm a bit uncertain exactly what we're talking about. I presume that the "larger export" refers to the new elements to construct citation / footnotes / bibliography / whatever - where by "new" I mean, things that aren't in GEDCOM. Is it that or not?

Then the applications in scope seem to be clear - they are GEDCOM "compatible" apps but not BG "compatible" apps.

So.... oh - I just lost it - how are these GEDCOM "compatible" apps going to be loading these new elements? On what sort of a file?

I think I must have misinterpreted somewhere...
GeneJ 2011-04-09T13:37:39-07:00
Hi Adrian --

Thank you.

You wrote, "how are these GEDCOM "compatible" apps going to be loading these new elements?"


#6 recognizes that some, but not all, Applications have source systems that are comparatively larger than GEDCOM's say 10 data fields.

When two of those larger systems try to talk today, GEDCOM is sort of like a bottle neck.

We can remove that bottleneck, but when we do, there will still be Applications that have source systems based on GEDCOM's 10 fields.

Enter the #6 magician!!
GeneJ 2011-04-09T14:01:04-07:00
P.S. Today, those with larger systems use a variety of different techniques to ship that comparatively larger data to GEDCOM--so there is a surprise behind every door.

A system that has GEDCOM's say 10 fields today might have only those same fields in the post BetterGEDCOM world.

If all programs had a default template set built from GEDCOM's say 10 Citation Elements," then we have ONE highly structured component setin an otherwise very flexible environment.

The thought was, that single highly structured component might play a role in discovering details for a #6 requirement.
AdrianB38 2011-04-10T05:10:57-07:00
Are you suggesting that a system that
(a) has GEDCOM's 10 fields (let's presume it's 10 for argument's sake) today,
(b) hasn't been modified for BG (yet);

would be modified in some fashion? (Just trying to get my head round this!)

Now, I thought the purpose of the template was to form the bibliography, footnote, 2nd footnote etc. So I'm still a bit lost on how this template using the 10 items helps with the _transfer_.

Are we suggesting that the umpteen dozen items from the big list (150 or whatever...) could be mapped onto the 10 in some fashion (which is CONSISTENT across all apps because the BG standard defines the mapping) and thus the umpteen dozen do get into an old style program? Albeit concatenated / compressed etc

If that could be done, the template is probably not that important - IF one had confidence that all 10 were printed.

Am I getting close to your ideas?
GeneJ 2011-04-10T08:33:03-07:00
Hi Adrian,

You write, "the big list (150 or whatever...) ... CONSISTENT across all apps because the BG standard defines the mapping"

Short answer--yes. Assume Family Historian is the receiving program example. Right now, if a Roots Magic GEDCOM is imported to Family Historian, the RootsMagic data will come to you organized differently than if you were importing a GEDCOM from TMG. (And we could go on and on and on.)

Requirement #6 says that once we've figure out how those larger systems talk to each other, we have to then make that same information import to the GEDCOM 5.5 based system--that can only read 10 fields.

Longer answer--How do CUSTOM elements import to Family Historian?

You wrote, "the template is probably not that important"

Two reasons:

1) If the BetterGEDCOM system is based on transferring some reference to a template. Family Historian, as above, doesn't have a way of exporting reference to a template--so how will a BetterGEDCOM based application understand what to do with those elements? Voila--enter that "default" template.

2) A little more complex. All Application producing Citations in the ordinary course of business use template assumptions for that purpose--although a UI might keep the template under the hood. If we only "concatenated / compressed," there is a good chance Family Historian's import will be consistently unintelligent in it's template. BUT .. there is probably a way to make that intelligent.

I think ... if we presume a GEDCOM "default" template as we develop the gazillions of BetterGEDCOM templates and elements, the concept of higher or critical source elements will emerge. (Maybe like GENTECH in the post 2010 world) If that is the case, then Family Historian user will not ONLY get all the BG elements, but that GEDCOM identified subset will produce an intelligent citation, albeit limited compared to BetterGEDCOM compliant Applications.
GeneJ 2011-04-10T08:33:50-07:00
I'm reading the posting just now by Geir .. it touches on some of the same concepts.
GeneJ 2011-04-10T10:01:39-07:00
Geir's High Level Considerations on Citations in BetterGEDCOM
Thank you.


Citation Elements and "A medium sized set of elements":
You wrote, "Yates ... 600"
He only worked with the QuickCheck models, and came up with about 550 elements. (That number from my Excel work with his data.) I found it too time consuming to work in the spreadsheet format. I propose it's easier to start with a reasonable "generalized" list (even have a list for that purpose), and work directly on Mills QuickCheck models. We could add to the list as necessary. The list in a "citation elements" wiki page if I can ever get that page to actually save.

You wrote, "EE is that it has not attempted to generalize the element types." Once you get beyond the QuickCheck models, there's a much bigger message there--regardless of the style you use, citations should be "crafted around ... three essentials: evaluation, identification, and description." [EE (2007), p 38]

You wrote, " very little (or none) of the classification info appear" -- could you clarify what you mean by that?

Zotero -- (I lost that page due to log out too, by the way.) There are more reasons to go in a Zotero like direction than not. Translations, the ability to easily change between different style guides, etc.
From both online and telphonic discussions, the folks at Zotero want styles that have been "generalized," to use your words. For each "citation element" not now part of Zotero's historical sources, they will want a justification. That's not difficult to do, if you have their list to begin with. Zotero has a new community outreach staff member. (For whom I have left a message.) Perhaps we can save this part of the discussion until I re-write the Zotero-like page and get it posted.
AdrianB38 2011-04-10T14:03:33-07:00
There is certainly a degree of duplication in there.

For instance,
National Government Records / Original Materials (U.S.) / Maps refers to "[RECORD GROUP TITLE]" while
National Government Records / Original Materials (U.S.) / Railroad Retirement Board refers to "[NARA RECORD GROUP TITLE]"

Unless I'm seriously misunderstanding, this is the same RECORD GROUP TITLE in both cases - it's just named inconsistently.

Similarly, [CREATOR], [CREATOR OF DATABASE], [CREATOR OF BLOG] are all the same item I imagine - it just happens to be that the thing is varying. And I guess Author and Creator may be synonymous.

Even so, we're not going to get a radical change in numbers.
GeneJ 2011-04-10T16:25:20-07:00

The problem with working from Yates worksheet is that sometimes his "Date" is really a date, while other times it's and "Access Date" The same for Year.

I have a separate spreadsheet with Yates data from the Full Reference note only. I wasn't trying to generalize, just eliminate duplicates.

I posted a separate spreadsheet to Google docs. It's Yates worksheet at the Full Reference Note level only. I wasn't working to "normalize the" labels--I called it distill. I was just trying to eliminate duplicates of the kind you, Adrian, point out.

The spreadsheet is at https://spreadsheets.google.com/ccc?key=0AhGBiJ9HyACHdDhTaC15Y3hKWHdqSG44dmRyNnlPVFE&hl=en&pli=1#gid=0
gthorud 2011-04-11T14:50:17-07:00

My database counted about 580 different elements in Yates, but that is not important. The point is that the number is large.

You wrote: "I propose it's easier to start with a reasonable "generalized" list (even have a list for that purpose), and work directly on Mills QuickCheck models."

I don't know how easy it is, but one point in my document is - should we look into the general methology of creating the "generalized list" - which rules will guide that process?

"citations should be "crafted around ... three essentials: evaluation, identification, and description." [EE (2007), p 38]"

If you mean that the purpose is not only to identify the text, you are right - i have not looked into if she has taken the easy way out wrt evaluation and description.

"very little (or none) of the classification info appear".

I may be wrong, but can you tell for every printed citation (eg. by looking at footnote only), and every element in it, what the input prompt/label for that element was? - without checking EE.

If you can create a general element type, that will end up in the same place in the citation for every source type where the element is used, that will be a test (of possibly more tests) if that general type works as good as the spesific ones.

Related to this is

- is the "sepcificity" needed for other things than input? If not, the importing app would be able to create the footnote without knowing the specific label/promt, it will only know the more harmonized element type - but can create the same footnote as if it knew the specific label/prompt that were used when the info was entered. (Am I making sence - this is not very well formulated)

-If you are using general elements, you could define specific labels/prompts for input of a source type, not using the name of the general field but the specific one - the specific labels being used only in the exporting program.

-Or, even when using general elements in the BG file, each element could carry with it both the general type of the field, AND the specific one (possibly the label/prompt). The importing app may use it, if there is a need. (The latter is just an idea, I am not sure if it is useful. One posibility is to show EE labels to both users, but using general elements that will "interwork" with all sorts of databases and programs. Sort of building EE on top of a general set of elements.)

Some of the above may not be clearly expressed, so ask!!!
GeneJ 2011-04-11T21:34:48-07:00

You wrote, ""generalized" list ...can you tell every element in it what the prompt for that element was.
Thanks again for forwarding the Zotero links. Over the weekend, I'm going to try to lay out a spread sheet of Zotero item types and labels, as sort of "generalized" baseline.

P.S. From the work I did in early January

You wrote, "it will only know the more harmonized element type - but can create the same footnote as if it knew the specific label/prompt that were used when the info was "
If a program can read a "template" it should be able to place the label.

You wrote, "If you are using general elements, you could define specific labels/prompts...specific labels/prompts for input of a source type"

Yes--to all the concepts of labels vs general elements. See also that link above.

You wrote, "if she has taken the easy way out wrt evaluation and description." Mills QuickCheck Models are comprehensive as a whole, but they were never intended to show every circumstance for every situation. She has an element called "evaluation"--in the 2007 book, see page 95 for an example of an evaluative comment at that "master source" level. Page 98 has a similar line, "Descriptive Details." By definition, Mills has only a few true examples in the book. (There is not "assertion" that each Quick Check references.) The BCG's Work Samples, though, are real word. See

On the wiki, I think I've posted examples of "evidence" citation from those Work Samples.

Yeah. I found them.
See GeneJ Mar 12, 2011 11:30 pm
GeneJ 2011-04-18T09:33:03-07:00
I wrote above, "Zotero has a new community outreach staff member. (For whom I have left a message.)"

Rec'd phone call back this date from Zotero's Community Outreach staff member.

When the time is right, they'd like an e-mail from us outlining our Zotero.org specific thoughts. In short, she would be our initial point person. --GJ
GeneJ 2011-04-11T18:35:41-07:00
Do we need Custom Elements?
By definition, every custom element will need a custom template.

Since we are considering permitting text in the template (outside of the element), couldn't we just pass that non-standard element as a text item?
GeneJ 2011-07-22T14:35:54-07:00
BetterGEDCOM_First ...

Someone commented, "NOTE: SC +1 PAGE is used in GEDCOM very flexibly. The data in this field should be in the form of a label and value pair, such as: 'Film: 1234567, Frame: 344, Line:28'"
I'm missing something.

I assume we all agree that the note style citation is narrated. References to films, frames, files or folders become an integrated part of my reference note citations, so I would might have "film no. 1234567" or film 1234567, but probably wouldn't want "Film: 1234567." (I have enough punctuation going on without adding colon dilema. Am I being silly?)

More and more in my citations, references to films fall in the citation part "source of the source."

What am I missing?
GeneJ 2011-07-22T14:42:20-07:00
P.S. In source of the source, I'm identifying the film, so it might be FHL film no. 1234567. I'm frequently looking up the film titles to see if they match other records already in my database, so sometimes I even add the film title in editorial brackets. For example ..., cites FHL film 167845 ["Jones County (Iowa), Death Records, 1880-1933"].
louiskessler 2011-07-23T06:19:05-07:00


The 'Film: 1234567, Frame: 344, Line:28' format was developed in GEDCOM with a purpose. It allows programs to add their own arbitrary descriptors to the citation in a standard format (via the ":" and "," separators) so that another program can interpret the meaning of the values.

The program need not display it this way, but just read and write it to GEDCOM in this format.

I'd say using inconsistent references such as "film no." in one instance and just "film" in another is not good. There is no way a program would be able to scan through the infinite set of possible representations to accuractely break down parts of your citation.

I actually think GEDCOM's pairing of descriptors and values is a great idea (just like I think their commas as levels in place names is). Instead of developing your own formats for your citations with the brackets, isn't it simple enough to say:

Film: 167845, Title: "Jones County (Iowa), Death Records. 1880-1993"

And why do you think that looks so bad?

GEDCOM is to transfer, not do display. It would be quite easy for a program to display that GEDCOM by removing the colon for the Film descriptor, and removing the word "Title" and adding the brackets for the
title descriptor if it had the capability. But if the underlying data it was using was not structured, that would then be near impossible.
GeneJ 2011-07-23T09:21:19-07:00
Thank you.

I've been working on some graphic examples. I'll try to get those uploaded to a page and hope the visuals help.