1. Standardized Metadata and CSL
Creating reference notes and bibliographies in genealogical software usually begins by the user adding details about the sources to their database. When users fill in fields of data about a source—identifying each author, title, date, etc—they are recording source metadata. Using yet other techniques, this metadata is electronically manipulated to create reference notes and bibliographies.
Professionals from different libraries and archives also create metadata, often about the same sources with which genealogists work, and there are formal “metadata standards.” Some familiar metadata standards are MARC21, MODS and Dublin Core. For example, if you review a title in the //Library of Congress Online Catalog//
(US), the associated “MARC TAGS” about the work are displayed in a tab.
The three noted standards, MARC21
, MODS and Dublin Core (there are others), are each different, especially in their targeted level of detail, and thus, in descriptors and definitions. In particular, MODS does not reach to the level of detail of MARC21, and Dublin Core is greatly simplified from MODS. 
Taken collectively, the standardized metadata is complex, however third parties have developed ways of extracting typically high-level bibliographic data
from a range of the standardized metadata. “Reference Management Software”—products like Thompson’s EndNote
feature the ability for users to create libraries of information about sources (including bits of extracted standardized metadata), from which bibliographies and reference notes can be created in a variety of recognized citation styles.
Many paper-based genealogists use products like EndNote
in their research and writing. Some of these products allow users to “drag and drop” citations from the library into their word processing program.
is an open-source reference management software product that works similar to Endnote. The technology behind Zotero is called “Citation Style Language” or CSL, which is also open source. Like Endnote, Zotero uses CSL protocols to interface with sets of standardized metadata. Said another way, CSL/Zotero define fields (like our citation elements) that map to the more complex standardized metadata. Using CSL technology, Zotero supports different languages and generates bibliographic entries and high-level reference notes in more than 1500 different citation styles.
The main point here is that at a more scholarly and even institutional level, approaches to standardization exist and their development continues. As well, third party efforts, including some that are open-sourced have discovered ways to work with this data.
2. Reference Management Software, Stylistic Matters: Substance and Form
Chicago Manual of Style Online
, at 14.1, "The purpose of Source Citations," explains, "Ethics, copyright laws, and courtesy to readers require authors to identify the sources of direct quotations or paraphrases and of any facts or opinions not generally known or easily checked." The source continues, "Conventions for documentation vary according to scholarly discipline, the preferences of publishers and authors, and the needs of a particular work. Regardless of the convention being followed, the primary criterion of any source citation is sufficient information either to lead readers directly to the sources consulted or, for materials that may not be readily available, to positively identify the sources used, whether these are published or unpublished, in printed or electronic form."
From Mills, 2007, 42, "Evidence Explained is rooted in [Chicago Manual's Humanities] style. However, most Evidence
models treat original or electronic sources not covered by [the CMOS, Bluebook, MLA, Turabian manuals], as well as some modifications that better meet the analytical needs of history researchers."
See introductory paragraphs on this wiki page and the opening comments to the wiki page, "Modern Style Guides." See also, Tamura Jones, "Genealogy Citation Standard
," Modern Software Experience
, 27 June 2011; James Tanner, "Looking towards a rational philosophy of citations
," Genealogy's Star
, 17 July 2011.
Where angels fear to tread.
Standardization involves issues of substance (information about the source) and form (given a particular style, how should information be formed into a citation). Matters of form include if or how to present the various elements including the punctuation thereof, which tends to vary by country, often by institution or publisher. All the form in the world won't make up for a lack of substance.
Some issues of substance cross discipline lines, but some are unique to particular disciplines. In the simpest sense, I see substance as fields or citation elements that provide information about the source. Some elemental needs (pun intended) are more common in genealogy than in other disciplines. For example, genealogists deal with many documents that don't carry any title (say, a letter), and we work with many documents that carry the same title ("Certificate of Death").
Just as BetterGEDCOM wants to standardize these matters of substance so that genealogists can exchange information, the broader class of reference management software is working toward the same objective on a worldwide basis and across disciplines. As far as I have been able to learn, reference management software approaches substance in the same way Geir's document suggests we would proceed--first setting out "source types" (Zotero calls them item types) and then, for each source type, reference management software has defined fields (like our citation elements), including references that are similar to Geir's modules. As a result of these approaches, there are a relative few source types in reference management software, and there will be a relative few source types in BetterGEDCOM.
That genealogists need more source types or more fields, including assertion level fields is not the point--that they have item types and fields we could build upon is the point. As these same third party efforts (some open source) develop item types and fields that support access to standardized metadata, so aligning our work with the larger reference management software movement might bring BetterGEDCOM's effort closer to online citations.
In reference management software, given the coordinated list of item types and fields, style libraries can be created and managed by a separate effort, based on something akin to Geir's concept of "style rule sets." Once core style have been developed, additional styles are added based on how that new style compares to a core style or to some other style.
3. Master Sources and Assertions
The source system in genealogical software, which BetterGEDCOM aimed to support, functions somewhat like reference management software--libraries of source data are developed to support the creation of bibliographic and reference note citations, but genealogical software source systems need to store full reference notes using integrated citation mechanics that involve elements at both “master source” and “assertion” level. These mechanics are frequently manipulated by users taking different approaches to managing the master source list. (Conversely, the library in reference management software stores high-level elements, frequently at a level higher than a master source developed by genealogists.) Said another way, to function, genealogical software needs to efficiently store and manage more elements/fields and likely at more or different levels than reference management software.
4. Genealogical Software and Source Types.
Currently, across genealogy software programs, source types tend to be directly associated with particular citation templates, which may or may not be proprietary. The template determines which citation elements are available for that given source type. Reference management software works quite differently. In CSL/Zotero, the item type (source type) determines the available information fields, which ideally represent a universe of the fields various styles or catalogs require. The styles are built separately, populated by the item type’s information fields.
Because of this difference, hundreds upon hundreds of source types exist in genealogical software, each unique to a style. These source types are often localized (US- or UK-centric) and use an array of citation elements that are also unique to each software vendor.
In contrast to the hundreds of source types existing in genealogical software that supports few styles, CSL/Zotero recognizes fewer than 50 source types, supports 1500 citation styles and serves a world-wide market.
5. Flexible Schema
Much like reference management software, BetterGEDCOM could defined reasonably high-level source types in accordance with Geir’s approach (I call his approach a schema).
Universal/County Specific > Source Type Class > Source Type.
We tentatively identified a group of 23 universal source type classes (books, journals, research reports, web pages, newspaper items, etc.). Census and vital records are among the source type classes that would be country specific.
Where CSL has established item types, BetterGEDCOM could/should adopt those named source types and descriptions. BetterGEDCOM would add source types to the BetterGEDCOM model as necessary.
For each Source Type (and thus Source Type Class), BetterGEDCOM could define a set of available elements (citation elements/data types) to ideally support the production of citations for that source type regardless of nuances in form, language or style. Where CSL has established fields, BetterGEDCOM could/should adopt those fields as citation elements, and add unique citation elements to the BetterGEDCOM model as necessary.
Vendors (and users, if the vendor permits) could/should be able to extend the BetterGEDCOM source types using a system of levels, with the lowest level representing the database assertion. All citation elements would be available at any level,* and lower level sources would inherit the properties of the higher-level source.
US > Vital Records > Assertion
US > Vital Records > Sammy Sue > Assertion
US > Vital Records > New Hampshire Deaths > Assertion
US > Vital Records > Massachusetts Vital Records > Assertion
US > Vital Records > New Hampshire Deaths > Sammy Sue, certificate 20632 > Assertion
US > Vital Records > New Hampshire Deaths > Thomas Jones, died 1698 > Assertion
US > Vital Records > State Certificates > Missouri Death Certificates > Mike Jones (1942)
US > Vital Records > State Certificates > Missouri Death Certificates > Assertion
*This includes page, etc., so that no particular citation element should be limited to the "assertion" level. See the lumper vs splitter graphics on the wiki page Citation Mechanics
Among others, “Metadata Object Description Schema
“[MODS] is an XML
-based bibliographic description schema developed by theUnited States Library of Congress
' Network Development and Standards Office. MODS was designed as a compromise between the complexity of theMARC
format used by libraries and the extreme simplicity of Dublin Core