Home > Sources and Citations
Sources and Citations
Introduction
As documented throughout the wiki and in numerous blog articles and other postings, it is not possible to transfer the source and citation data recorded in several major genealogy programs to a different program using GEDCOM. Data become lost or distorted during the transfer process. The reason is that these programs have extended their source and citation data with fields that are more specific than those in GEDCOM, or use standard GEDCOM with custom formatting conventions, and/or export data that do not conform to GEDCOM. Below we attempt to catalog a host of BetterGEDCOM wiki pages and discussions about sources and citations. The pages discuss alternatives for extensions of GEDCOM that can solve this problem, additional functionality related to sources and citations, the implications of various citation practices and also solutions for automatic download of data about sources (metadata) from internet services for use in citations.
The data content in the sources is outside the scope of this page.
Please note that there can be many discussions on pages, few discussions are listed separately below..
Please help us get this page complete, send page links with summaries to ghtorud or GeneJ.
Subject categories
- Most recent work
- Previous major discussions and working documents
- User Requirements
- Current programs, GEDCOM and problems
- Citation styles and guides
- Citation methods, practices and examples
- Other solutions for Biographic meta data and Citations (not genealogy specific)
- Terminology
- Other, not categorized
The same wiki page or discussion thread may appear in several subject categories.
1. Most recent work
A Data Model for Sources and Citations.
2. Previous major discussions and working documents
3. User Requirements
4. Current programs, GEDCOM and problems
- Sources and Citations in GEDCOM
- GEDCOM Source/Reference Components BetterGEDCOM GoogleDoc spreadsheet.
- Application Overview.
- Software Citations. Research for BetterGEDCOM comparing the user interfaces, programmed citations and GEDCOM export of an 1880 US Census entry made to both FTM-Mac and RootsMagic v4.
- Citation Mechanics. A series of screen shots providing an overview of the TMG (The Master Genealogist) v7 user interface for sources and citations, including how source types are created and defined, how citation elements are selected and citation templates are written. The last three graphics use screen shots to compare "lumper" and "splitter" approaches to develop the programmed citation for an 1880 US census.
- Master Source. July 2010 attempt to develop screen shots of the master-source equivalent lists and templates/forms in different software packages. Highlights terminology differences.
- Citation Specific Fields. July 2010 attempt to catalog what might be termed the “assertion level” fields in different software packages.
- Repositories-repositories. July 2010 attempt to develop screen shots and/or catalog the Repository fields/options in different software.
- What's wrong with sources An early (Dec 2010-Mar 2011) discussion of the problems wrt sources in GEDCOM.
- What role does a Source Citation play ... Nov 2010 discussion initiated by Russ W about Sources and Citations
- Data Tests - BetterGEDCOM blog page referencing various tests, mostly by Russ W and some by Randy Seaver; many of these tests cover Sources and Citations,
5. Citation styles and guides
- Modern Style Guides An early effort to overview of different styles and reference available style guides. Only some of the styles referenced are specific to genealogy, but most genealogy styles are derived from another typically classic citation style. For example, both Mills' Evidence style and the Register's citation style are "rooted" in Chicago (see Chicago Manual of Style or "CMOS").
- Mills, The Evidence Series.
6. Citation methods, practices and examples
- About Citations. This work catalogs examples of programmed and/or published citations specific to indirect evidence, negative evidence, information snippets (extracts and abstracts) and multiple sources in a single citation. Many of the examples came from "Work Samples" published on the Board for Certification of Genealogists (BCG) website.
- Citation Graphics. Images of more than 20 mostly US centric sources and examples of different forms of citation (source list entry, full reference note, source label) for each. (Although included, in-line citations are also found in genealogy/family history materials.)
7. Other solutions for Biographic metadata and Citations (not genealogy specific)
- Dublin Core Metadata Initiative A brief introduction to DCMI and some links.
- Zotero is one of several existing solutions for collection of metadata from various computer sources, storage of documents and generation of citations. The "An architecture for sources ..." document in category 2 above contains more references.
- BetterGEDCOM GoogleDoc spreadsheet,Zotero Fields_alpha_97-04v.xls
This file reports about the July 2010 Zotero "item types" (=source types) and the "fields" (=citation elements) associated with each. (Zotero development is an ongoing process.) - A few Zotero screen shots- 5 Jan 2011 BetterGEDCOM blog article.
- The page "Citation Graphics" includes a few more Zotero screen shots. Clickhere
and scroll down to the line that begins, "From WorldCat."
- Citation Style Language is used by Zotero and other to specify the characteristics of citation styles. Here is an informal summary of the CSL 1.0 specification. The "An architecture for sources ..." document in category 2 above contains more references.
8. Terminology
Unfortunately the terminology used on the various pages is not consistent. Some terms are defined in the Definitions page (link below and on the BetterGEDCOM wiki left side navigation bar); some of the documents, pages or discussions define their own terminology. More work is needed to come up with a consistent terminology.
Other, not categorized
a.
GENTECH Genealogical Data Model (2000). Gentech, among other things, has a multilevel source model.
b. Robert Raymond, "
Interoperable Citation Exchange 2009-03-11.pdf" (2009). Presentation sometimes referred to as "I.C.E." BetterGEDCOM discussions below.
c. RootsTech 2011 Wikipage, "
Open Interactive Sources."
d. References to John H. Yates work to develop a standardized implementations for Evidence Style.
e. GeneJ's personal blog for articles:
f. Mark Tucker,
ThinkGenealogy, "
Better Online Citations (series)."
g. Randy Seaver's blog, Geneamusings, articles about his attempts to create and sync sources and citations. [will need to catalog and post these... ]
"it is not possible to transfer the source and citation data recorded in many genealogy programs to a different program using GEDCOM. Data become lost or distorted during the transfer process."
That is both incorrect and misleading. You make it appear that GEDCOM itself is the problem. GEDCOM is not perfect, but is adequate enough to allow the transfer almost all of the data correctly.
The problem is a with the program developers who do not export their data correctly to GEDCOM, and who do not import their data correctly from GEDCOM, and do not use the same data structures internally to represent their source and citation data.
Developers want to export their data their way. They do not want to be forced to fit their data into some standard. So they squeeze their data into GEDCOM into whatever manner they see works for them that is most compatible with their data structure.
Doing so, the data does not become lost or distorted. Example: RootsMagic exports their source and citation data to an (almost legal) GEDCOM file with user-defined tags used for their template definitions. They can then read that information in again PERFECTLY!
When I look at a RootsMagic4/5 GEDCOM with Behold, I can see all the source and citation data. It is all there. I have to do work if I want to interpret it, though, since they are not true to GEDCOM but are refitting their own data structures into a GEDCOM template.
We can't tell developers how to design their data structures. Each will do it their own way. So they will always be different.
The only way a BetterGEDCOM will work to improve this transfer process is if it is designed so well that all developers decide to do the work to transfer their own data structures properly and accurately to the BetterGEDCOM format. And then they must read in that format properly and accurately and convert it to their own internal structure.
Convincing developers to do this is the Herculean task you are addressing. If they don't, it doesn't matter whether you use GEDCOM, BetterGEDCOM, or PerfectGEDCOM. None will transfer the data if the developers don't follow it.
Louis
I agree with you. The reason I’m asking is because I’m wondering whether BG would need a special notation for assertions. I don’t believe this is necessary.
I believe that an evidence record (e.g., a persona) is an overall assertion about the existence of a person, and that the source record that it refers to provides the justification or “proof” of the assertion. Say the evidence provides the person’s name and birth date, so the persona record might be:
Ignoring the source reference I would say this persona is an assertion, and the source reference provides its “proof.” If you want to say that the persona is a complex assertion, made up of a name assertion and a birth date assertion, that makes sense too. The source record the reference points to would be something like:
The source record is also an assertion, about the existence of a book in this case, put I don’t feel any compunction to provide a proof of the books existence, so the buck stops here.
Do we need a more complex mechanism for assertions or this enough? I mean, is it sufficient just be be able to look at the data in a BG file and say, “this is an assertion,” or “that isn’t an assertion” and that’s all?
I don't care whether you are entering a "persona" a "conclusion" (or the pfact or a turkey-baster for that matter). --GJ
P.S. The Evidence Explained definition for assertion is "assertion: a claim or statement of 'fact.'
I believe Tom opted to use a different definition in the entry to the Definitions, but the meaning of the word "assertion," was previously discussed. http://bettergedcom.wikispaces.com/message/view/Pending+Definitions/35150918
Neither do I. "Assertion" is synonymous with PFACT, relationship, etc., so needs no special notation beyond what already exists. It's just another name as far as I'm concerned, albeit one that sweeps up PFACT, entity-existence, entity-relationship, etc., so has attractions as a _term_ to me.
Gene's post above seems to indicate she means the same as me by assertion, so again assertion-level simply refers to the stuff used to justify a PFACT etc.
"t is not possible to transfer the source and citation data recorded in several genealogy programs to other programs. Data become lost or changed during the transfer process. The reason is that these programs have either extended their source and citation data beyond standard GEDCOM, or use standard GEDCOM with custom formatting conventions, or export data that do not conform to GEDCOM."
If you remove the extra words in Geir's statement, he originally said:
"It is not possible to transfer ... data ... using GEDCOM."
By taking off the "using GEDCOM", the blame comes off the GEDCOM (which it should) and is placed on the programs who have extended GEDCOM their own way, which they shouldn't have.
Louis
"Assertion" is synonymous with PFACT, relationship, etc., so needs no special notation beyond what already exists. It's just another name as far as I'm concerned, albeit one that sweeps up PFACT, entity-existence, entity-relationship, etc., so has attractions as a _term_ to me.
Thanks. I believe the same. I was confused by the term "assertion level". I think I now understand it at the citation elements that locate evidence within sources, rather than the citation elements to describe sources. I make the same distinction in the DeadEnds model, where the citation elements that describe the sources are in the source records, and the citation elements that describe where the evidence comes from in the sources is in the source reference.
I don't think this is exactly what GeneJ means, because she puts non-source information into her source records (e.g., references notes, conclusions, bits of evidence), and she may mean this information as the assertion level.
Louis suggests to have a list of Citation Elements ready before Rootstech. I don't think that is realistic, such a job will take many months and may even take years if you want to have a 80% COMPLETE international solution.
Tom (or should I write Tmo, he clearly makes a point out of spelling a foreign name wrong) writes that the only job of the source reference – I assume he means reference note – is to identify where information was found. I do not agree with this limitation, current practice allows for inclusion of summaries, extracts and reasoning in a reference note – and there is no reason to prevent that. What I see Tom doing is to tailor user requirements to his technical E&C solution – it should be the other way around.
It is interesting to note that Tom accuses those who does not agree with him to "trivialize BG", whatever he means by that. I am not able to take such arguments serious.
I will write about slitting and lumping elsewhere.
I didn't mean reference note when I said source reference, though they are similar.
Misspelling your name was not intentional; I apologize.
I did not tailor those user requirements to fit my model. I designed my model to fit the user requirements as I have determined them to be during twenty years of using genealogical software and imagining how I would want more advanced genealogical software to work. Possibly my imagination is too limited, however, to see the full set of user requirements.
Summaries of information from sources, if that information has not been directly extracted and placed in evidence records, belong in notes. My source reference structure allows these notes, as I'm sure does the reference note.
Extracts belong in evidence records that partition the information into units that describe persons and events. In models without evidence records, extracts, if they are to be put into the database, would have to put in the source records or the source references. This could be done as notes, but then the information is not structured well enough for the software to deal with it. Louis has suggested that the extracts could be put into person or event based structures that are kept within the source records. The only real difference between my recommendation for evidence records and Louis's recommendation for evidence structures in source records, is that in my case the information is in separate records, and in Louis's case the information, basically identical in content, is found in the source records. I would say that this one difference is the only major disagreement between Louis and me. It boils down to the fact that Louis doesn't see any purpose for those independent records, whereas I see the whole evidence and conclusion research process as needing them.
Reasoning, if it has to do with figuring out what specific evidence means, belongs as notes in the source references, which I guess, would be how it is done in reference notes.
Reasoning, if it has to do with concluding which sets of evidence records refer to the same person, belongs in the conclusion references in the conclusion records.
The real problem with reasoning is worrying about whether it can appear in a structured form that software can recognize and work with. I am stumped by this problem. Currently I can only imagine that reasoning in a conclusion reference be just text that describes why the researcher has made their decision about which evidence applies to which persons, how they have resolved discrepancies and gaps in the combined evidence, and how well their believe they have answered their research questions.
Geir said:
"Louis suggests to have a list of Citation Elements ready before Rootstech. I don't think that is realistic, such a job will take many months and may even take years if you want to have a 80% COMPLETE international solution"
Geir. We've been hacking around for over a year. The discussion is wonderful, but we can discuss until the cows come home.
We've got to make the attempt to formalize something. It need only be a 0.1 draft version, but it has to be something. It must be simple and understandable and shouldn't be expected to include everything.
Once we've got something, it can be changed or expanded. But it will give everyone a focus and a basis for what the potential must be.
And believe me the sense of accomplishment we'll all feel when BetterGEDCOM can announce that it has done something.
Otherwise, we'll just be like the GEDCOM mailing list that has discussed GEDCOM ad-infinitum for 20 years and produced nothing. That's not what I want to spend my time doing.
My proposal for getting started is simple. The initial template is GeneJ's Zotero spreadsheet. For my recommend plan of attack, see: http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48362656
Louis
This is just a repetition of a discussion we had in the spring.
There must be a more than one century old practice to refer to sources, with extracts, summaries and arguments in reference notes. I do not see any reason to prevent that, and no one has presented any objective argument against it, other than "Because I think ….".
Summaries – just have a look in e.g. Evidence Explained for use in reference notes, and there are plenty of examples in the genealogy literature.
In your paragraph about Extracts you write "This could be done as notes, but then the information is not structured well enough for the software to deal with it." What do you mean by that? An extract that you would present in a reference note is text, why can it not be handled (this goes without me saying they should go in notes). You already have TEXT_FROM_SOURCE at the source_record and source_citation level in Gedcom – what is your problem?
When a solution for evidence (information representing the content in sources) has been developed, I would like to see a way to import parts of an e.g. a transcribed source as an extract into a reference note, but it has to be done in a way that does not require a program to implement the evidence solution internally (and this does not mean that I do not want to see a solution for evidence in transcripts, image, tabular or codified form) unless you are saying that BetterGEDCOM cannot have a solution for Sources and Citations before it has a solution for evidence.
Where have you found all these rules that you state about whether the various things should be put in inline text (or wherever your notes end up) or in reference notes?
Reasoning in structured form that software can recognize… well, I leave that problem to you.
Say you have an item of evidence that says “Hannah Trask was born on 18 October 1789.” You create a source record or a reference note where you describe the source. How are you going to extract that fact about Hannah Trask? Are you going to add that sentence to a reference note? If you do that software cannot give you any support for finding that fact later. It’s just text in a record, and the best you can do is some kind of text search.
But if you were to create a persona record from that sentence, and have that persona record refer to its source, then you have an object in your database that your software can use. You can ask your software to please find all the evidence recorded about persons named Hannah Trask, and this Hannah Trask persona will show up in the list of all Hannah Trask personas. Your software can give you a table of all the Hannah Trasks with all the key info you have found about them. You can see cleanly in front of you in that table, all the Hannah Trasks that seem to be the same person, and those that don’t seem to be. You can immediately see patterns that build into conclusions, and you can immediately form hypotheses about the different Hannah Trasks, and your software can give you complete support in grouping together different Hannah Trask personas into conclusion persons. How are you going to do that if all the evidence information you have extracted about Hannah Trasks are stuck in notes in reference notes? How can your software help you? It’s no better than shuffling through a deck of real 3x5 cards by hand.
An extract that you would present in a reference note is text, why can it not be handled (this goes without me saying they should go in notes). You already have TEXT_FROM_SOURCE at the source_record and source_citation level in Gedcom – what is your problem?
I just explained that.
When a solution for evidence (information representing the content in sources) has been developed, I would like to see a way to import parts of an e.g. a transcribed source as an extract into a reference note, but it has to be done in a way that does not require a program to implement the evidence solution internally (and this does not mean that I do not want to see a solution for evidence in transcripts, image, tabular or codified form) unless you are saying that BetterGEDCOM cannot have a solution for Sources and Citations before it has a solution for evidence.
I have been arguing from the start of BG that the use of persona records and evidence event records is the best possible solution for handling evidence and records-based genealogy. The solution exists. I am not waiting more of your years when an obvious and well-understood solution is staring ourselves in the face. Do you remember all the references I have made to the nominal record linking work that has been done for the past 40 years? There is a long and well-established scientific tradition of using persona records for family reconstructions and many other tasks that require linking together records from different types of sources (which is what genealogical research is). There is a trail of academic papers in existence about this. I mentioned web sites where classical papers of the field can be found and read. Papers going back at least to the 70’s. The one common thread of every one of these papers, of every one of these efforts to find ways of linking persons, is to extract the evidence into persona form so that the data can be processed. I am talking about “records-based” processes here, processes that have been around a long time. The fact that vendors of today’s genealogical systems only seem to understand the conclusion nature of genealogy, is no excuse for BG to ignore the entire body of work that has been done applying software to the records-based area.
... and I'm sorry Tom. Every time you make a point that the Persona is the best way to do it, I have to point out that I prefer including the names and event data in the Source Detail (without interpretation).
These can be searched the same way as your Persona can.
As we've discussed earlier ( http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=60#32728224 ), our methods are very nearly the same, except I prefer not to use the Persona.
Louis
Yes, I know, and I continue to be amazed that you don't see the value of the Persona concept. I wonder whether you have ever considered all the successful work done in nominal record linking. I keep hoping you will think about the problems that must be addressed by records-based genealogy, and come around to the right side!! I continue to maintain that the Persona is the one key concept needed to elevate genealogical software into the research-quality domain. I suppose there are worse things to worry about, however!
I read that. It didn't help. In fact it stimulated my comment. You don't there define "assertion level." You state that something that I don't understand is an example of one.
We had a thread back in August that discussed the term "Assertion Level." See the thread below and the graphics on the attached page.
http://bettergedcom.wikispaces.com/message/view/Software+Citations/41147303
If you feel you need to throw your hands up in the air, feel free. --GJ
AdrianB's definition:
"assertion level data (i.e. that which does NOT come from the source record)"
Well from my way of thinking, everything in the Source and in the PAGE Where-within-source and for that matter the entire Source Reference ONLY should contain information from the source record. There should be no interpretation. It should only be the raw data.
Any information NOT in the source record have to be notes or conclusions that are placed with the conclusion data. None of my posts about Keys/Values Source Types and Citations had anything to do with any of that.
Which is why I too am confused by this term.
Louis
... The one exception to this in GEDCOM 5.5.1 is the QUAY, which is an assessment of the Quality of the source record. I feel that should not be there, but moved out to be with the conclusions.
But this is a completely different matter that will get us way, way, way off topic if anyone chooses to continue in this thread. :-(
Louis
I agree with you 100%. There seems to be some misunderstandings about where conclusions belong.
I know exactly what's going on. In the software systems of today, there is no good place to put actual evidence data, and there is no good place to put conclusions. Therefore, some people have figured out how to use source records in some programs to do quadruple duty, holding source information, reference notes, evidence and conclusions. Those who do it this way do not understand that by adding evidence records and a proper approach to handling conclusions, we can finally let sources be sources, and let the other three types of information go to where they belong. They are legitimately concerned that by changing what goes in source records they might loose some of the advantages they have gained by essentially redefining the purpose of the source record to fit their needs. And what is very unfortunate in my opinion is that some cherish this approach so much, and are so sure that it is the perfect solution, that they can never agree to a change This is too bad, and I believe it can only lead to the trivialization of BG. And I am tired of people criticizing the ideas without presenting any solid alternatives. All I get are references to old discussions that barely apply to the subject at hand. You and I have given full-bodied examples of how this approach to sources and citations work. There have been no alternative approaches presented, and certainly no other complete examples of how to hold source and citation data in an archive file. Anyone who would take the time to read the DeadEnds model, carefully enough that they reach an understanding of it, will see the proper places for evidence, for source info, for notes, and for conclusions.
Re (1) "two level source .... the accident report you are interested in would be a source and it would have a source reference to the annual report, and that source reference would hold the page"
That's absolutely fine. In fact, after I'd originally posted, I worked that very idea out and it was going to be my counter proposal if you still didn't want Page in a Source's details. Clearly I'd misinterpreted you - though my excuse is that in GEDCOM the Page would be ultimately subsidiary to the level 0 Source even in your clarified view.
Re (2) "I think it is important to realize that this approach, with sources and source references, is wholly predicated on the idea that we will have evidence records in the database." OK - I wasn't making that assumption.
My understanding of the term is that it refers to data that is an attribute of, or a relationship to / from, an assertion, where assertion means any property, fact, attribute, characteristic, trait, of an entity, relationship to / from an entity or even existence of an entity.
Thus, if I've got this right, the source reference for a date of baptism tells us (say) which source is used for the date, where within that source, etc. That's an assertion level source reference in MY understanding.
If we have a 2 level source, e.g. a series of baptismal entries, each with their own source record, beneath a parish register with its own source record, then the source records for the baptismal entries each have a source reference that says which parish register the entry is in, where in it, etc. That's I guess a source-level source reference. In MY book.
One issue is that we still don't have a name for the concept of any property, fact, attribute, characteristic, trait, of an entity, relationship to / from an entity or even existence of an entity. PFACT only covers part of it (not relations and not existence of entities) and beside, it'll never fly as a term to be used by normal people. Much as I look with concern at such a GENTECH like term, assertion seems closest. When / if we need such a term.
Thanks. Is the title of a book an assertion? I thought we had settled on the terms citation element or metadata (though I objected to that) for properties of sources. I prefer the term attribute which can be applied to all properties of all records, but I gave up on that when citation element seemed to be the consensus. Based on yours and GeneJ’s comments I assume that assertion is another synonym for citation element in the source context.
If I put my mathematician's hat on, then any statement that I deem to be true is an assertion but that's probably not helpful.
I was interpreting (and I emphasise this is _me_) "assertion" as just relating to properties / relationships / existence of individuals, families, etc - the external,real world stuff, not that within the study of genealogy (i.e. not source records).
Thus, the existence of someone who is married to Mary Roe is an assertion, that their name is John Doe is another assertion, that they are married to one another is an assertion. I wasn't envisaging the existence of a baptism certificate to be an assertion, nor that the title of a book is XYZ to be an assertion.
If it helps, an assertion is something that I assert to be true and that I therefore need to demonstrate (a.k.a. "prove") is true. The act of asserting something seems key to the existence of an assertion. We don't normally go to the trouble of proving that the title of a book is XYZ - we just take it as read that it is and say so. So I wouldn't take the title of a book as an assertion. (Of course, anyone sufficiently anally retentive to want _everything_ proving might, but I'm not going there.)
I guess also that any assertion needs to be either:
1. common knowledge
or
2. justified by a source reference (or citation if you're writing it directly) with any extra proof-statement as required to interpret the source reference.
That's what I have understood assertion to be - other people's mileage may vary. And that means that in my view, assertion is not a synonym for a citation element. Rather the collection of citation elements in the source context JUSTIFY an assertion.
This may or may not help but I think we got into the arena of assertion level by discussing properties of sources and whether some properties (citation elements) could appear both at the source record level and in the context of a source reference justifying an assertion about a person or family or place etc.
Geir suggested,
"As documented throughout the wiki and in numerous blog articles and other postings, it is not possible to transfer the source and citation data recorded in several major genealogy programs to a different program using GEDCOM. Data become lost or distorted during the transfer process. The reason is that these programs have extended their source and citation data beyound the limited capabilities of GEDCOM, and/or export data that do not conform to GEDCOM."
Louis responded, "Wrong … GEDCOM has one really nice construct in their sources that unfortunately few programs have decided to use. Do they not bother to read the GEDCOM specs? It is full extendibility in the source information."
Louis referred to his analysis of a RootsMagic export to GEDCOM (http://www.beholdgenealogy.com/blog/?p=874) and commented that "RootsMagic decided NOT to include a Title field. I guess they figured it was better for them to not allow the person to make up their own title. So what they do is generate the title by using the "Collection", "Repository", "Repository Location", "Format" and "URL" fields together, and separating any non-blank fields by semicolons."
http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48033200#48078834
I disagree with Louis' assessment.
In about March, we described the RootsMagic export on the wiki this way, "RM exports some Citation Elements in standard GEDCOM tags, not in a very standard way. The source name goes in ABBR, and a few other elements are placed in standard tags. The text of a full footnote reference goes into the TITL. A short footnote and an bibliographical note are placed in proprietary fields. Two proprietary _TMPLT structures, one at the source and one at the citation level (where in source), contain source-value pairs for each element (field) for the source type, with the full name of the element as the type. An ID for the source type is stored in the _TMPLT structure in the SOUR record."
http://bettergedcom.wikispaces.com/Application+Data#RootsMagic
Chose your terminology, but RootsMagic's user interfaces recognizes an extended group of source types and fields, all of which are tied to a programmed citation (whether RootsMagic default or custom). On export to GEDCOM, the field values from the master source are written in a sort of freeform style to "title" and the values about RootsMagic's "Source Detail" fields (assertion level input) are exported to GEDCOM's field "PAGE."
--Create a master source in RootsMagic using one of the published source types (book, journal, etc.--a source type that includes fields "Author" and "Title"), cite the source (so that you can access the assertion level fields), and export same to GEDCOM (excluding the Extra details that are RM specific). You find the values for the master source fields "Author, Title, etc.) all export to the single GEDCOM field "TITLE."
0 @I2@ INDI
1 NAME Nicholas /Firestone/
2 GIVN Nicholas
2 SURN Firestone
1 SEX M
1 _UID 69C542210EFF4CC0AB21DFC994833A9100FD
1 CHAN
2 DATE 1 DEC 2011
1 BIRT
2 DATE 25 MAR 1712
2 PLAC Berg, Alsace
2 SOUR @S5@
3 PAGE 241
0 @S5@ SOUR
1 ABBR PER* QTR Russell 1964
1 TITL George Ely Russell, "Founders of the American Firestone Family," <i>The N
2 CONC ational Genealogical Society Quarterly</i>, 52 (December 1964): .
--Create master source in RootsMagic for a census and again cite that source (so that you access and complete the assertion level fields); export same to GEDCOM (excluding the Extra details that are RM specific). You'll find the values for the master source tags exported to GEDCOM's TITLE, and the values for assertion level tags are exported to GEDCOM's PAGE.
0 @I1@ INDI
1 NAME Asa Ruggles /Thomas/
2 GIVN Asa Ruggles
2 SURN Thomas
1 SEX M
1 _UID A42147C706A04C7EA850C1871BB1CC274291
1 CHAN
2 DATE 17 NOV 2011
1 BIRT
2 DATE CA 1799
2 PLAC Maine
2 SOUR @S2@
3 PAGE Madison Township||Madison twp.; 334; p. 410A (stamped); dwelling 121, family 121; Asa Thomas household; 27 December 2006
0 @S2@ SOUR
1 ABBR CEN IA Jones 1880 T9, roll 348
1 TITL 1880 U.S. census, Jones County, Iowa, population schedule, , ; digital i
2 CONC mages, <i>Ancestry.com</i> (http://www.ancestry.com : accessed ); citin
2 CONC g NARA microfilm publication T9, roll 348.
Our blog, Randy Seaver's blog and the various BetterGEDCOM wiki pages (see the attached page) document earlier research about how the various vendors with expanded systems struggle to pass information via GEDCOM. Terry's TMG Tips does a nice job describing the TMG export (http://tmg.reigelridge.com/Sources-exporting.htm). I also wrote a post about programmed citations, see "Programmed Citations, a general overview."
http://bettergedcom.wikispaces.com/message/view/A+Data+Model+for+Sources+and+Citations/48011898
If Geir's description of the problem is "incorrect and misleading," then how do others feel the problem should be described? --GJ
It is not possible to transfer the source and citation data recorded in several genealogy programs to other programs. Data become lost or changed during the transfer process. The reason is that these programs have either extended their source and citation data beyond standard GEDCOM, or use standard GEDCOM with custom formatting conventions, or export data that do not conform to GEDCOM.
GeneJ:
I was under the impression that you think the Zotero is a wonderful system for citations. Do you use it in practise? Does it meet your needs? Or do you feel it lacking, because you keep talking about complications in structuring sources that Zotero doesn't handle?
If Zotero is an excellent citer, then why isn't its model and level of detail and simplicity of structure good enough for BetterGEDCOM?
The problem I have with making anything too flexible and/or too complicated is that it leaves both the program and the user doing what they want. That in itself prevents the one thing we all want - the data to transfer over precisely and unambiguously - and that is the current problem with GEDCOM.
I am trying to jump start this effort again.
Why can't we just start with a simple set of Source Types versus Keys and create Citation templates using them? It could be kept within manageable terms by using your wonderful Zotero spreadsheet as the initial structure.
Once we have that, then we will have our first tangible result. Then BetterGEDCOM will have produced its Version 0.1 and the organization will finally become significant.
Doing this one task will prove if this general idea is possible and how many of the basic citation templates can be built.
We can only expand and correct a starting model if we have a starting model.
But if you want to keep discussing all the exceptions and cases why any model cannot handle everything, then we'll keep on talking for another year without getting any further.
Louis
... and re ambiguities, e.g. one genealogist thinks a their high level item is a source and the other wan't a low level.
I say forget that right now. Just make a decision and define which Keys are most often Source and which are most often for defining the location in the source and go for it.
I don't want to describe how many times I've had to write and rewrite and rewtite an algorithm and the code for it in Behold. You can't get everything you want the first time. You have to start with something simple and then you can see how it works and correct it and built on it.
Louis
Gulp. Maybe I've missed something.
I'm saying that it's an unnecessary complication at this stage to force some set of fields to the master source and other fields to the assertion level. --GJ
... and G-d I hate not being able to edit my typos in my posts after entering it too quickly.
GeneJ:
Please explain again what "assertion level" means. Because, as Tom said and I agree, the Source Reference should not be considered evidence and should not have anything to do with asserting conclusions. It is simply a declaration that there is some material in existence that may be of use to someone.
The Source is the Item. The PAGE tag is the location in the item. This is simply done so that you can conveniently cite a source, and IBID it again with only the changed info.
e.g. in footnotes:
1. Book, Title, Author, Publisher, Page 14
2. IBID, Page 18
3. IBID, Page 42
The evidence/conclusion process is a completely different matter. We should only be discussing Sources and Citations here.
Louis
1. The Annual Report for Railway Accidents in the UK. I have an interest in one accident report from the 1903 edition. I'd create a source record for the single report with a title of format something like that used for a article in a journal - but it would be nice (if not essential) to add the page number as its own item on the source record.
If the annual report is a collection of accident reports, then you can treat it as a two level source, which means the accident report you are interested in would be a source and it would have a source reference to the annual report, and that source reference would hold the page. At least that’s how I would do it.
2. An English parish register contains lots of baptisms, marriages and burials. I'd create a source record for each baptism, marriage or burial (splitting in action here....) in order to associate a note containing the identification logic with the specific data, rather than have a hundred identification statements in the source record for a whole register. Again, as part of identifying where in the register the baptism, marriage or burial is, I'd like to associate a page number with that source record. Again, currently it can be done without it - I usually put the page number as part of the transcribed text, but aren't we supposed to be codifying stuff properly?
I believe you aren’t seeing the value of the source reference idea. You can go either way on whether the register should be the bottom of the source tree or whether the individual items in the register are the bottom of a two-level tree. However you make that decision, the page number goes in the source reference to the source at the bottom level of the tree. Even if you had just the register level for a source, you still wouldn’t have a 100 identification statements in the source; that information would be in the 100 different source references that point to the source. And those 100 source references would be located in the 100 different evidence records that you extracted form the source.
I think it is important to realize that this approach, with sources and source references, is wholly predicated on the idea that we will have evidence records in the database. If BG decides that it will just clean up the conclusion only model of GEDCOM into something a little more complete, and eschew the evidence level of data, then everything I’ve been promoting about these source and source reference ideas are out the window. I think the biggest problem that people are having understanding the ideas is that they are having difficulty understanding the paradigm shift between systems that only hold conclusion data, to systems that hold both evidence and conclusion data. With no evidence records, the source reference doesn’t make any sense, so if you want to keep any information at all from the source or about the source you have to stick it in the source record, so you are forced to create a separate source record for each item of evidence. Some of us seem so comfortable with this idea, because it is forced upon us, that we can’t see the problems that it creates, nor can we see how well we can clean up the problem with evidence records and source references.
... and the Evidence/Conclusion process is where Tom and I disagree.
I state that the Source Reference can include the events as stated in the source - not as interpreted. Whereas Tom's concept is to create multiple levels of Personas to contain those events.
That has been discussed in detail in other threads dealing with Evidence and Conclusions.
Hi Louis,
You wrote: Please explain again what "assertion level" means.
For this purpose and in GEDCOM terms, there are three fields at the assertion level=Page, Text and Media, but we're really just focused on Page and Text. In practice, I believe there really isn't much consistency to how those remaining to fields are used. We might as well call them mystery field a and mystery field b, since programs have renamed them and effectively spit each into a varying numbers of sub-fields. It's easier for me to just refer to them as fields at the assertion level.
And then you wrote, "Because, as Tom said and I agree, the Source Reference should not be considered evidence and should not have anything to do with asserting conclusions. It is simply a declaration that there is some material in existence that may be of use to someone."
I'm thinking we are all getting a little tired ... because of course *every single field* in the master source/source/source_record and at the assertion level (how ever you want to number and name them) has something to do with asserting conclusions.
I've already said that the differences being discussed are unlikely to have anything to do with what you or Tom consider "evidence."
I see the _ibid._ reference, but I don't follow how that relates to defining rules for "fields" that must be applied at one level or another, beyond the one to many relationship. In Adrian's example, all the information he is going to refer to comes from one page, so there is not one "source" to many "pages" relationship.
So, help me understand why you care where I commit a field? If a "source type" is a "birth certificate" then there will be a certain number of fields associated with it. (Ala, the Zotero spreadsheet.)
I’m guessing that to “commit a field” means to place a citation element in a record somewhere. So it sounds like you’re suggesting that different users should be able to put their citation elements in different places based on their preferences as to how sources should be structured. I believe such freedom would throw the whole notion of source templates for constructing citation strings out the door. I continue to strongly suggest that BG do my tasks 3 and 4, which is to determine a set of source types that we officially support, and a set of citation elements we officially support, with specifications as to which citation elements are recommended in which source types.
Let's pick a nice controversial field that is not on the Zotero spreadsheet--"Household ID" (US Census), which has a value "Asa Thomas household." Say I include that field in my "master source" and you record the field in "Source Details."
By “source details” I am guessing you mean a source reference.
If we both recognize the field, please help me understand why you care that I have entered that field/data in the master source, but you have entered the same field/date at the assertion level.
If you want to think of the record of the Asa Thomas household as a source unto itself, fine, then “Asa Thomas household” is probably the title citation element for that source. If you want to think of the census as the source, then “Asa Thomas household” belongs in the source reference from the evidence to the source.
If you don’t care about software support in generating your citation strings, then it makes very little difference how a user chooses to break up their “source tree,” and software could simply allow users to build their source records with any source type they enter in with any citation element types and values they enter in, with the source records structured to any number of levels they desire. But it will make a difference if BG embraces a fixed set of source types in order to solve the problem of generating citations with templates. I frankly don’t care how you would like to structure the source and citation information, but I hope you realize that allowing the flexibility you seem to be recommending, the whole citation generation scheme is in jeopardy. Which leads me to wonder how important you believe the citation generation feature is.
I also don’t understand what “assertion level” means. By context my guess is that citation elements are the things that are existing at the “assertion level”. If this is true then we have FOUR terms we are now using for the exact same concept (the concept being "information about a source"); those four being citation element, metadata, assertion, and attribute. I am throwing up my hands in despair.
Re https://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48033200?o=20#48222570
See
http://bettergedcom.wikispaces.com/message/view/Sources+and+Citations/48033200?o=20#48221768
"...defining a set of source keys, source reference keys."
In the abstract, Is there a reason to assume any particular elements would be defined for the "source" and a then some different group for the "source reference"?
Or, does that mean that those most common/standardized templates (the 80-20 group) would have elements defined at your "source" and "source reference?"
While there are clearly some logic parameters for the kind of information most frequently appears at one level or the other, for the circumstance beyond that 80-20 rule, it won't be hard for to find some exception to many of the rules.*
--GJ
*Even if we don't consider lumper-splitter preferences.
A source record describes a source of evidence, and that source can contain a little to a lot of evidence.
But when we wish to cite where our evidence comes from we need to include the source, obviously, but we often need to be more specific in specifying where in the source, the particular evidence we are citing came from.
So, a source record, in my opinion, should describe an overall source, and all the citation elements we find in the source record do the job of describing that source. So if the source is a book, the citation elements in the source record for it are the obvious things like, title, author, publisher, publication year, ISBN.
Now, a source reference is something that connects an evidence record (e.g., persona), to a source record, but not only must is connect to the source record, it must also specify where in the source the evidence came from. Specifying where is also done by citation elements, and in this example, the obvious example is the page number.
We don't want the page number in the source record, because we're probably going to want to extract other evidence from the same book and that other evidence is going to come from different pages.
But page is an important citation element. It doesn't belong in the source record itself. Ergo, it belongs somewhere else, and the obvious place to put it is either in the evidence record itself or in the connection that the evidence record has to the source record. See this example (and since the last examples I did were using XML, this time I'll use GEDCOM:
In this example the INDI record is a persona that includes only the information about a single person that was extracted from a single item of evidence in a book (which is what the definition of a persona is).
And the SOUR record is the source record for a book. So we have here two records, an evidence record and a source record. We need to connect the evidence record to its source. So in the INDI record there is a 1 SOUR line with a PAGE sub tag. This is the source reference I keep mentioning. It both points to the SOUR record by including its ID, but it also contains the citation element needed to locate the evidence in the source.
This is just one way to represent this connecting concept in a model, but this is a useful and easy to understand way to do it. Or the source reference could be modeled as a separate record type, and in a relational database it would have to be represented as a separate table (with person id, source id, and other columns for page and other citation elements). I much prefer simply treating the source reference as an "attributed pointer" inside one record that points to another. But as I said this concept can be represented in different ways.
Remember, the whole job of the source reference is to connect an item of evidence with the source it came from, while also giving the details of where that specific item of evidence came from in the source.
So, in this example, we have five citation elements in the SOUR record (TITL, AUTH, PUBL, DATE, ISBN), and we have one citation element in the source reference (PAGE). They are all citation elements, but the first five apply to a source as a whole, while the last refers to a specific location in the source. I think all citation elements can be broken into these two main varieties, and the source and source reference breakdown is the best way to separate them.
Note specially, as I mentioned above, by keeping the page number in the source reference, the SOUR record for the book can be used by all the personas we extract from the book, not just one,
GeneJ:
Tom explained that very well. (Except I still shudder at his calling the Individual a "persona" - but then I also hate the word "repository" which makes me think of a garbage can.)
Let me describe in more detail what I think would work well specifically with regards to key/value pairs that I talked about earlier.
Looking first as the source, e.g.: the particular census, the book, the letter, the family bible, the personal interview.
One way I might do the keys for above examples (subject to you experts negotiating me into a better set of keys) are:
Source Type: Census; Year: yyyy; Place: pppp
Source Type: Book; Title: tttt; Author: aaaa; Publisher: pppp; Place: pppp; Year: yyyy;
Source Type: Letter; Date: dddd; Sender: ssss; Receiver: rrrr;
Source Type: Family bible; Title: tttt; Owner: oooo; (help me here...)
Source Type: Personal interview; Date: dddd; Place: pppp; Interviewee: iiii; Interviewer: iiii;
So here I've got a number of different keys:
- Source Type
- Title
- Author
- Publisher
- Place
- Year
- Date
- Sender
- Receiver
- Owner
- Interviewee
- Interviewer
These together with hopefully not too many more, should be able to completely define any source.
The main key is the Source Type which is always defined for every source. All the other keys will depend on what the Source Type is and are specific to a Source Type.
What GEDCOM tried to do was stuff these into three GEDCOM tags: AUTH, TITL and PUBL, with Author under AUTH; Publisher, Place and Year under Publisher; and almost everything else, including the Source Type under the TITL tag. That's why after all I didn't think RootsMagic was that wrong in stuffing their Source keys under the TITL tag.
But I don't think 3 keys are all that's needed. In my example above, for the 5 sources, I've used 12 different keys already without even trying. I think we need to be careful and define the minimum number of keys here that will define every source accurately. Keys meaning the same thing can be used in different source types. We would hopefully end at no more than 50.
Now let's look at the PAGE tag, which I'll prefer to call the Where-within-source. Here we need to make up a new set of keys just for the Where-within-source. The particular keys are again defined by Source Type. Here we can again try to reuse keys when possible if they have the same meaning in different source types. But hopefully these keys for the Where-within-source are different from the keys in the Source (above).
Source Type: Census; Enumerating District: eeee; Page number: pppp; Line number: llll; Dwelling number: dddd; Family number: ffff; (maybe only some are applicable)
Source Type: Book; Page number: pppp;
Source Type: Letter; Page number: pppp;
Source Type: Family bible; Page number: pppp;
Source Type: Personal interview; Time from start: tttt;
So here, I just have these keys:
- Enumerating district
- Page number
- Line number
- Dwelling number
- Family number
- Time from start
but maybe there's 50 here as well.
You can think of the Where-within-source keys as the stuff you still have to specify when you IBID something. (Hope that helps!)
Now this is how I think this can with your citations (and I might get some of it wrong because I'm not an expert at citations):
You have multiple citation templates for each Source Type. You have different ones for the various formats, e.g. primary citation, subsequent citation, endnote, footnote, bibliographic entry, etc. (you know what they are, I don't).
A template for one might look like:
Book (Primary Citation): $Author, <i>$Title</i> ($Place: $Publisher, $Year), $Page
If we define these templates in terms of the keys in the Sources and the Where-Within-Sources in a simple programming-like definition such as the above (including variables beginning in $ for the keys and HTML-like markup for style), then EVERY PROGRAMMER will know perfectly and unambiguously how to program this exactly the same way!!!!!!!! (as many exclamation marks as you want here)
Louis
And GeneJ,
As far as I'm concerned, your Zotaro spreadsheet right now is just about perfect. It's at: http://bettergedcom.wikispaces.com/file/view/Zotero+Fields_alpha_97-04v.xls/243233631/Zotero%20Fields_alpha_97-04v.xls
In that spreadsheet, you've got the Source Types on the top row, and you've got the Keys on the left.
All I think that needs to be done is to separate the Source Types into those that describe the Source, and those that describe the Where-within-source.
And we may be close to done ... other than the inevitable arguing on every single entry as to whether it should or should not be there.
Louis
I'm sorry. I mean't to say:
"All I think that needs to be done is to separate the KEYS into those that describe the Source, and those that describe the Where-within-source."
You know, this will both meet the goals of defining sources for BetterGECCOM, and (if the templates can be refined and developed) will also satify the needs of BetterGEDCOM's committment to SourceTemplates.org.
It sure would be nice if we could commit to developing these, maybe to have a draft ready prior to RootsTech.
Louis
I'm thinking that all of this will only seem simple when we are looking back on it in the rear view mirror. Can't tell you how much I appreciate that you're hanging with me.
Wanting to standardize the elements in the source/master source/source_record is a valiant effort. In theory, you can write such a standard but it likely won't work across the standard 80-20 source types and it definitely won't hold in practice.
Tom somewhat described the problem during the last Developers Meeting when he described the thought process to answer the question, "what is my source."
See the Zotero item type (source type/master source type) "blogPost." (The item type name probably gives this one away.) i Zotero captures the minimum bibliographic identity (the blogTitle and related data), but it drills down to the level of the article "title" (blogPost/blog article).
Zotero's "blogPost" item type captures information at the level of the article, but folks who are going to "cite" a whole series of articles from the same "blog" will prefer one bibliographic entry for the blog. Some of those folks will set the master source/source/source_record at that high level, and then enter specific blogArticles at the assertion level.
Other folks have no interest in a series of articles from the same blog--they will likely consider the blogPost (article) to be their "master source" and it identify one specific article.
It's not hard to find examples of a source that could be even more specific than that "blogArticle." Say someone posts the digital image of a letter in a blog article. That level of detail is beyond the scope of Zotero (and would be beyond the scope of our 80-20), but there will be folks who declare that letter/digital image to be their "Source."
So, same blog, same "information," different users = different requirements/fields applied to the "master source/source/source_record" which means that different information is recorded at the "assertion level."
Blogs are considered a form of publications by default albeit a rather moden form. What's interesting is that for traditional published materials--where the so-to-speak bibliographic metadata has long been standardized--these same differences exist in the approach to information at the "master source." (Even in software that implemented the identical published source example from in Mills Evidence Explained).
The implications of these "what is the source" mechanics* is more unruly when the focus shifts to archival materials. I have more examples that I think will help describe the differences. Will continue to post those examples with more detail as time permits.
In haste here ... as I wrote above, "you can write such a standard but it likely won't work across the standard 80-20 source types and it definitely won't hold in practice."
(a) When I say it won't work across the standard 80-20, that is because the named fields (that communicate well understood information) might appear in the master source for one item, but at the assertion level in another. Example, a "photograph" vs a "photographic album" vs a "collection" (that includes a photographic album that include photographs).
(b) As for not work in practice. It probably wouldn't matter what the group defines as the 80-20 master source fields for a US census. I am a splitter and I'll find some way, by golly, to create a master source for each household in the census. Likewise, there are folks who have a master source "Census" or "1850 Census." Well, those folks too will find some way to force their high level information fields down to the assertion level.
Sorry to have rushed this. --GJ
*I used the term "mechanics" to describe assigning information (fields keys/elements) to the "master source" (source, source_record) and then determining what other information (or fields....) is declared in the "assertion" level fields (now PAGE <WHERE_IN_SOURCE>, and "TEXT....).
I take the underlying message of your latest to mean that people will want to do things in very different ways in deciding on what their sources are. I'm not sure, but I get the impression you believe that BG should be very accommodating to these desires.
I have two responses.
First, the hierarchical nature of sources that I have introduced in DeadEnds (it's not original to me, I'm claiming no credit), gives users quite a bit of flexibility. I could imagine (though would never myself) make each household from a census as a separate source record, but it would then have a source reference to the census it came from, so we get those two preferences you mention taken care of by simply structuring sources. My solution to this census issue, is to create a separate evidence record for the household (it is what I have been calling an evidence event record), and then to have that evidence record have a source reference point to the census. That source reference could have the house number, the family number, the enumeration district, or whatever.
Second, I'm harder hearted than you. I don't believe in going to extremes on flexibility. If we believe that there should be a serious attempt to use templates as a way to support citation generation, we must take a stand on the issue of how to define sources. We cannot allow willy-nilly flexibility on the part of the users, allowing them to decide exactly what they want to call a source and what not to. We define a list, we give a mechanism for extension in the rare cases where a source doesn't fall in our list, and support no other flexibility. There has to be a reasonable compromise between structure and flexibility.
Possibly some of the differences in these views may stem from the fact that I am a strong proponent of evidence records, that is, persona and evidence-event records in which information extracted directly from evidence is placed. With evidence records there is no need for low level source records, because the info you otherwise need to place in source records is more appropriately placed in the evidence records. With evidence based records the need for these low-level source records is eliminated.
The point is this. Evidence should not be placed in source records. In current, conclusion only systems, there is no handy place to put evidence, however, so careful genealogists like yourself need to find a reasonable place to put it. So you add it as reference notes to source records, and this leads you to need very low level sources since each of your sources ends up holding specific information about specific evidence.
But when we have a system that allows the evidence to exist as records in their own right, then we have the right mechanism available, and we do what we should, store the evidence in non-source, evidence records. And this removes any need for low level sources.
In the Blog-Post-Image case, a "lumper" might have one source only, with the type of "Image" with a title like "Will of Z, on the Blog X, Article Y, as retrieved on Date d" and his where-within-source might be "bottom right".
A "splitter" might have a source of type "Image" and title of "Will of Z" which refers to a higher source of type "Article" called "Article Y" which refers to a source of type "Webpage" called "Blog X". The where-within-source could still be "bottom right". The Date of Access should probably go with the where-within, too.
I'm quite certain differences really don't have anything to do with what you or Louis would consider the details that would be entered into an "evidence" record .. but I'll not even try to convince you otherwise, okay?
I've said before that I could see a benefit to a "lower level source" (I call it the third level), but hierarchal structures would require more time to learn and manage; believe they would be even harder to standardize. Would be nice to hear Bob Velke's take on hierarchal structures compared to the TMG structure for sources and citations. The good folks at GENBOX implemented the "lower level source" concept.
Tom wrote, "There has to be a reasonable compromise between structure and flexibility." Indeed, so let me turn this around ...
So, help me understand why you care where I commit a field? If a "source type" is a "birth certificate" then there will be a certain number of fields associated with it. (Ala, the Zotero spreadsheet.)
Let's pick a nice controversial field that is not on the Zotero spreadsheet--"Household ID" (US Census), which has a value "Asa Thomas household." Say I include that field in my "master source" and you record the field in "Source Details."
If we both recognize the field, please help me understand why you care that I have entered that field/data in the master source, but you have entered the same field/date at the assertion level. --GJ
Unfortunately, I can say that I would rather like to put "page(s)" in the source record. Two examples:
1. The Annual Report for Railway Accidents in the UK. I have an interest in one accident report from the 1903 edition. I'd create a source record for the single report with a title of format something like that used for a article in a journal - but it would be nice (if not essential) to add the page number as its own item on the source record.
2. An English parish register contains lots of baptisms, marriages and burials. I'd create a source record for each baptism, marriage or burial (splitting in action here....) in order to associate a note containing the identification logic with the specific data, rather than have a hundred identification statements in the source record for a whole register. Again, as part of identifying where in the register the baptism, marriage or burial is, I'd like to associate a page number with that source record. Again, currently it can be done without it - I usually put the page number as part of the transcribed text, but aren't we supposed to be codifying stuff properly?
I agree with you. It would be straightforward to add to the current GEDCOM5.5 standard a number of new tags for citation elements, and we would have a near perfect solution. We would also need to allow the 1 SOUR tag to have level 2 tags for citation elements for things like pages. And we would have to allow 0 SOUR records to contain 1 SOUR source references to handle highly structured sources (e.g., my standard example of a journal article). It's a simple, solid solution that would be as good as any other we could come up with.
I've tried to make the point that GEDCOM syntax, XML syntax, JSON syntax, Google protocol buffers, etc., are all isomorphic to one another, and this point you are making is a consequence.
But you will have to admit, however, that GEDCOM, as it is defined today, and as it is implemented today, does not allow full sharing of data between any pair of programs.
As you point out none of this works without developers implementing good solutions. If our standard specifies how to represent all sources and how to represent all citation elements, we have done our bit of the puzzle; it is then up to the developers.
In brief, this is because Reunion allows me to assign a GEDCOM tag to every different field, so as far as I can tell it ALL gets exported.
TNG ( http://tngsitebuilding.com/ ) is quite forgiving in its import and lets me decide what to do with each field type on import. So this page on my TNG site
http://roger.lisaandroger.com/getperson.php?personID=I16&tree=Roger
is almost entirely derived from a Reunion GEDCOM file - the exception is the mapping section which currently is done within TNG since Reunion doesn't support mapping or exporting the LAT/LONG to GEDCOM file.
The sources listed at the bottom of the page are of course not "EE Perfect" but I think that they contain enough information in general to allow someone else to find the same information.
Now of course trying to move this same GEDCOM file to almost any other software is no where near as successful!!
I do not read this as GEDCOM is the problem, I am not saying anything about who should be blamed. I think you have to look closer at RootsMagic's GEDCOM export, it is far from GEDCOM compliant - a refernce note is not the same as the title of the document.
But, the text can be improved, what about:
"As documented throughout the wiki and in numerous blog articles and other postings, it is not possible to transfer the source and citation data recorded in several major genealogy programs to a different program using GEDCOM. Data become lost or distorted during the transfer process. The reason is that these programs have extended their source and citation data beyound the limited capabilities of GEDCOM, and/or export data that do not conform to GEDCOM."
Roger,
Maybe you could review Reunion, and make an entry on this page?
http://bettergedcom.wikispaces.com/Application+Data
Geir:
You said: "these programs have extended their source and citation data beyound the limited capabilities of GEDCOM".
Wrong! Wrong! Wrong! Wrong! Wrong!
GEDCOM has one really nice construct in their sources that unfortunately few programs have decided to use. Do they not bother to read the GEDCOM specs? It is full extendibility in the source information.
It is the: +1 PAGE <WHERE_WITHIN_SOURCE> construct, and is defined as:
WHERE_WITHIN_SOURCE:= {Size=1:248}
Specific location with in the information referenced. For a published work, this could include the volume of a multi-volume work and the page number(s). For a periodical, it could include volume, issue, and page numbers. For a newspaper, it could include a column number and page number. For an
unpublished source or microfilmed works, this could be a film or sheet number, page number, frame number, etc. A census record might have an enumerating district, page number, line number, dwelling number, and family number. The data in this field should be in the form of a label and value pair, such as Label1: value, Label2: value, with each pair being separated by a comma. For example, Film:
1234567, Frame: 344, Line: 28.
By using the Label/Value pairs, the reading program will understand what each of the values mean and the whole source CAN be interpreted. Each program would still need to "understand" the meaning of each of the labels if they wanted to be smart and translate things to their own convoluted format, but there's really no need to. If the labels define the interpretation of the values, then a reading program only need to display the labels beside the values and full meaning is understood by the user. This can also easily be searched by label, e.g. "IF film = 1234567 and Frame = 344"
GEDCOM does NOT have limited capabilities on this front. It has very extendable capabilities. Maybe we at BetterGEDCOM could perform one great feat and inform genealogy developers of this PAGE tag in GEDCOM, promote its use, and come up with a standard set of labels to be used.
Louis
Below is a simple journal entry. By simple, I mean that have only one author (no editors or other contributors), I'm not reporting any credentials for the author, the article was not serialized (it appears in just one issue). If we were to develop a benchmark case, it would not be so simple.
Here is the bibliographic entry:
Russell, George Ely, “Founders of the American Firestone Family.” _National Genealogical Society Quarterly_ 52 (December 1964): 241-44.
There are additional requirements at the full reference note level, but all of the information in that simple journal bibliographic entry needs to be supported/exported to the Source_Record and none of it should be entered at the assertion level.
--GJ
With regards to RootsMagic compliance to GEDCOM. I don't think they've done anything really wrong.
Take a look at the example I posted at http://www.beholdgenealogy.com/blog/?p=874
RootsMagic decided NOT to include a Title field. I guess they figured it was better for them to not allow the person to make up their own title. So what they do is generate the title by using the "Collection", "Repository", "Repository Location", "Format" and "URL" fields together, and separating any non-blank fields by semicolons.
If they decide to generate the source title that way, there's nothing wrong with it. You can argue you don't like how RootsMagic does it, but as far as they are concerned that is the title and they export it that way. That doesn't make them non-complient.
It gives them the advantage of being able to parse it when it comes back in, so they can fill their fields up again.
It's not so bad for other programs, since at least all the data is there. And it displays well in Behold, and I don't have to touch it (other than remove that extra colon and semicolon they add at the front). It is the most important data and will get loaded into other program's title field.
This is how RootsMagic set up its internal data structure. They don't have a Title field, but have those 5 fields instead, which to them represent the title. I totally disagree with them implementing the Repository and Repository Location as fields on the source, since they should be in the Repository information that is available from the Repository button on the screen.
Also RootsMagic has scores of templates. That example in my post was only for one particular template. Each template has different fields. But those "Master Source" templates all get exported to GEDCOM into the TITL tag as semicolon-separated values.
Now there is no reason, if they were doing this already, to not have just made them label/value pairs. Then they wouldn't even have to worry about which template they were using, because they'll be able to figure it out again by the unique combination of pairs read in.
What I'm saying is that maybe GEDCOM needs an "offical" TAG that can accept label/value pairs in the SOUR record, just like the PAGE <WHERE_WITHIN_SOURCE> construct that I mentioned in my last post.
RootsMagic was on the right track. They just didn't take it far enough.
Get them to hire me. I can tell them what to do to fix it all. :-)
GeneJ:
I am not a citation expert, so I leave the proper development of those up to you.
But for your example, using the RootsMagic method, and label/value pairs, this is what I would do:
0 @S1@ SOUR
1 TITL Author; Russell, George Ely; Title; “Founders of the American Firestone Family.â€; Journal; _National Genealogical Society Quarterly_; Volume; 52; Date; (December 1964)
and in the Source Reference, I would have:
1 SOUR @S1@
2 PAGE Page; 241-44
Now, there are some GEDCOM source Tags that could be used instead (such as AUTH, PUBL, etc.), but I wanted to illustrate this value/pair methodology to the extreme here.
Louis
One more comment.
I believe GEDCOM did NOT include a label/value pair ability for the source record, because they thought they had all the fields they needed to define the source record in their:
+1 AUTH <SOURCE_ORIGINATOR>
+1 TITL <SOURCE_DESCRIPTIVE_TITLE>
+1 PUBL <SOURCE_PUBLICATION_FACTS>
and they were trying to get everyone to standardize on using those.
If there are a few others that should be added, then let the BetterGEDCOM team determine that.
Or if we need a label/value pair because there are too many, then let the BetterGEDCOM team decide on that.
Take a look at all your source templates and extract what's needed to define just the source reference, and see how many of them will fit into the structure of the above three tags, and what other tags might be needed.
Louis
Here's the definitions:
SOURCE_ORIGINATOR:= {Size=1:248}
The person, agency, or entity who created the record. For a published work, this could be the author, compiler, transcriber, abstractor, or editor. For an unpublished source, this may be an individual, a government agency, church organization, or private organization, etc.
SOURCE_DESCRIPTIVE_TITLE:= {Size=1:248}
The title of the work, record, or item and, when appropriate, the title of the larger work or series of which it is a part. For a published work, a book for example, might have a title plus the title of the series of which the
book is a part. A magazine article would have a title plus the title of the magazine that published the article.
For An unpublished work, such as:
! A letter might include the date, the sender, and the receiver.
! A transaction between a buyer and seller might have their names and the transaction date.
! A family Bible containing genealogical information might have past and present owners and a physical description of the book.
! A personal interview would cite the informant and interviewer.
SOURCE_PUBLICATION_FACTS:= {Size=1:248}
When and where the record was created. For published works, this includes information such as the city of publication, name of the publisher, and year of publication. For an unpublished work, it includes the date the record was created and the place where it was created. For example, the county and state of residence of a person making a declaration for a pension or the city and state of residence of the writer of a letter.
But I hope you would concede that the necessity to use a sequence of key/value pairs as the values of GEDCOM lines is because GEDCOM does not provide the necessary citation element tags. And I assume you would also concede that if every vendor choose to support a different set of keys, that sharing would be almost impossible.
Why not simply decide on a standard set of keys and then create a new GEDCOM tag for each one, and write those keys into the new standard? This is what I am certainly advocating as the proper course for BG, and I think others agree.
I agree with you, that this extension could all be done within the context of good ole GEDCOM simply by creating some additional citation element tags to add to the three you have described. I am not against that idea, as I think it would be wonderful to be able to express BG data in GEDCOM format. But you and I are dinosaurs on this issue. XML is the writing on the wall.
Can you take some consolation in the fact that a simple plug-in would allow you to convert BG data to GEDCOM? I'm happy with that. My DeadEnds software, though using a much fuller model than that of standard GEDCOM 5.5 (that is, the DeadEnds model), can export in GEDCOM with the flick of a switch.
Tom:
Yes, I thought that's what I was trying to say. I'd be very happy if BetterGEDCOM could come up with the "standard" set of keys for both the source and the source reference, which then could be used as BetterGEDCOM tags and/or XML and/or whatever. The format is irrelevant. The definition of the keys are what's important.
Then, those two sets of keys can be used as the variables in various (or even a multitude of) citation templates (Shown Mills and whoever else) that will standardize these as well. The problem is that these are not defined in a precise way, and each programmer interprets them differently.
Important Point:
By defining a set of source keys, source reference keys, and citation templates using the keys as variables - this work by BetterGEDCOM would be do-able.
I don't like plugins or APIs. By my experience, they are slow and limiting for large GEDCOMs. Writing the code to do what they do directly is 10 to 100 times faster, and can be customized and can do error detection, etc.
My plan is once BetterGEDCOM is defined, I'll write into Behold an input from BetterGEDCOM and an output to BetterGEDCOM. Since Behold inputs and soon will output legal GEDCOM 5.5.1, Behold will be its own converter.
Louis