BetterGedcom - Better GEDCOM Requirements Catalog

Home > Goals > Requirements Catalog

Co-Moderators: gthorud and AdrianB38 Snapshots of the page

RULES

Discussions: All discussions about the content of this page shall appear on the discussion tab for this page. The SUBJECT of the topic shall contain only the ID of the requirement, "space dash space" and the subject of the requirement. There should be only one topic per requirement. Enter the Description and Why (see below) in the first posting in the topic.

New entries: All wiki users can add a new requirement. Please check if there is an existing entry for the same requirement, and if there is a similar one check the discussion of that requirement or contact the Proposer of the requirement to see if the existing one can be slightly modified to cover your requirement.If you are in doubt about which group the requirement should be placed, or if your requirement should be entered, contact the moderators, see the top of the page.

Entry Moderator: The person proposing a requirement is responsible for updating the catalog entry following discussion. Updates shall be "announced" in the discussion topic. See rules in the template below.

Requirements Catalog Index

Quick access index:
Administration Characteristics Confidence&Accuracy Conversion Data Date DNA Event Evidence Family Group International Multimedia Person PersonNames Place Ship Source Support Syntax Test Suite Text Handling Timeline

Background

Background to BetterGEDCOM

Background to these pages:
At the Developers' Meeting of 17 January 2011, it was resolved that BetterGEDCOM's existing list of Goals were not appropriate for Goals and the list should be re-structured to extract a simple Goal and reformat the rest as Requirements. This set of pages is being written to carry out that task.

Personal comment by the original author (Adrian Bruce) - the sections of this catalogue were previously used by me as a template for a full-scale IT project. Though trimmed from that, they may still be regarded as over-the-top for a Wiki-based project. Having previously got stuck on the argument whether we had goals or requirements, I would rather take too rigorous a path now. Inspiration comes from the Volere Requirements Specification Template in "Mastering the Requirements Process" by Robertson & Robertson. Note also that I use the term "project" for the BetterGEDCOM work, even though it (so far) satisfies no formal definition of what a project should be.

Goals of BetterGEDCOM

BetterGEDCOM will be a file format for the exchange and long-term storage of genealogical data.
It will be more comprehensive than existing formats and so become the format of choice.
(Note - first sentence is a minor rewording of Goal 1 agreed 3 Jan 2011. Second sentence justifies why BG and not an existing format.)

Clients, Customers, Stakeholders & Users

(This section is here simply to make you think)

Currently we have no identified Client paying for the project.
No-one will buy the BetterGEDCOM product itself, therefore we have no Customers in the proper sense.
Stakeholders potentially affected by BetterGEDCOM are developers of application software that reads or writes genealogical data.
Since data may be transferred from a BetterGEDCOM file into an application, and then to a genealogy service provider via an API in the application, providers of such networked genealogy services could be affected by the structures in BG, and could therefore seek to influence or control BetterGEDCOM.
Potential users of BetterGEDCOM include people or organisations currently holding files of genealogical data and people using application software that exchanges or stores genealogical data.
A very small number of people may manipulate data in a BetterGEDCOM directly - most users will not do so.

Requirements Constraints

(Not all of these may be relevant in practice)

The application software that will potentially read and write BetterGEDCOM data files is developed and maintained by many organisations, all of which are independent from the BetterGEDCOM project. Therefore the BetterGEDCOM project cannot directly control the match between BetterGEDCOM data files and the BetterGEDCOM standard.
An uncounted number of files of genealogical data exist in various forms of the GEDCOM file format. The design of BetterGEDCOM shouldminimise the effort required by application developers to write software to convert those files to BetterGEDCOM format.
Many of the existing files of genealogical data do not conform to any official version of the GEDCOM standard.
Many of those files have extended the GEDCOM standard with tags whose function is known only to the developers of the application software concerned.
Many of those files have extended the GEDCOM standard with events and attributes defined by the user of the application, and their meaning is known only to those users.
The GEDCOM Standard(s) are under the ownership and copyright of The Church of Jesus Christ of Latter-day Saints.

Naming Conventions & Definitions

See Glossary of Terms

Assumptions

Scope of BetterGEDCOM product

BetterGEDCOM will produce definitions of the file format in:

A report (definitely)
A data model (definitely)
A codified form applicable to the technology chosen for the file format - e.g. XML schema or DTD (possibly)

BetterGEDCOM should provide a test suite of data that will

allow software suppliers to assess compliance of their software
help them to diagnose issues
assist them to resolve issues.

BetterGEDCOM will not have responsibility for testing application software.

BetterGEDCOM will not have responsibility for defining how individual applications should translate genealogical data from their native formats to and from the BetterGEDCOM format, nor from application's own varieties of GEDCOM to and from the BetterGEDCOM format. (Experienced users may make suggestions, but the responsibility lies with the application's owners.)

Requirements Introduction

A division between functional and non-functional requirements is traditional in Requirements Catalogues. Functional requirements say what the new system should do (e.g. "Pay staff according to the 1929 Conciliation Staff Agreement") - non-functional requirements say how the system should do it (e.g. "Pay 10,000 staff overnight each Tuesday", or "Run on Windows 2000 Server OS"). As a result, the "techno-speak" requirements are part of the non-functional requirements.

Given that the BetterGEDCOM file format does not do anything itself, it is debatable how relevant the division is, so, after trying to keep to it, I am putting them all together.

The Requirements below use the following template.

Id:	Code to identify the requirement - in bold
Title:	A short description - max 10 words - in bold.
Description:	One or two sentences - use "must" if importance is mandatory; "should" if very desirable; "could" if desirable.
Importance:	One of three values: Mandatory; Very Desirable; Desirable. For the time being, this is the assessment of the proposer.
Why?:
Source:	If from another page or discussion, please note and link All previous discussions should go here, but the last/current discussion should be linked to in Discussion.
Way forward?:	Comments on possible ways forward
Dependencies:
Approval status:
Proposer:	The creation date for the requirement and wiki ID of the proposer. and optionally name.
Changes:	Date changed (month and day) eg. Feb 21 and user id, comma separated list. Append last change to end of the line, eg: 22 Feb gthorud, 23 Feb userxxx
Discussion:	Link to the current Discussion topic for this requirement. The subject of the topic should be the ID followed by the Title. See top of page.

Copy this Empty template to create a new requirement:

Id:
Title:
Description:
Importance:
Why?:
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Again, acknowledgments are made to the Volere Requirements Shell template in "Mastering the Requirements Process" by Robertson & Robertson

Detailed Requirements

Research Administration

Id:	Admin01
Title:	Research Administration Information
Description:	BetterGEDCOM must allow recording of administrative information needed to organise and document the research work.
Importance:
Why?:	--- This is a place holder at the moment, details and detailed requirements to be added. ----- See the discussion of this requirement for summaries of the functionality in some genealogy programs.
Source:
Way forward?:	More detailed solution, see Admin02 onwards.
Dependencies:
Approval status:
Proposer:	7 March 2011 gthorud
Changes:
Discussion:	Discussion

Id:	Admin02 (was Task01)
Title:	Research Task
Description:	BetterGEDCOM shall be able to record and track a Task (search or other task) that needs to be done or has been done. Information recorded about the task itself could be a Title/Short description, a full description (formatable). Research tasks can be organized in simple lists or grouped into Objectives, see below.
Importance:	Very Desirable
Why?:	Supports faithful recording of research status and results, and reduces repetition of labors.
Source:	Gramps, GenTech model
Way forward?:
Dependencies:
Approval status:
Proposer:	BrianJD
Changes:
Discussion:	Discussion at Task01 - Research Task

Id:	Admin03
Title:	Task information
Description:	BetterGEDCOM shall be able to record information about a Task, for example used for Categorisation (keyword, category, type (research/correspondence/other)), Progress management (priority, staus, dates. comments about dates), Resource use (Expences, number of hours used)
Importance:
Why?:
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Admin04
Title:	Identification of persons, events, places that the task is about
Description:	BetterGEDCOM shall be able to link a task to records representing the person(s), event(s), place(s), source(s) etc. that the task is about, existing when the task is defined (started). A possibility is also to record links to persons, events etc. that are created as a result of the task.
Importance:
Why?:
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Admin05
Title:	What to search
Description:	BetterGEDCOM shall be able to record information about, or link to records representing, WHAT to search – e.g. a source. Possibly an URL pointing to the source.
Importance:
Why?:
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Admin06
Title:	Where to do the task
Description:	BetterGEDCOM shall be able to record information about, or link to records representing, WHERE to do the task – Location name (if not linked to), Repository, Place (eg. cemetery), Address
Importance:
Why?:
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Admin07
Title:	Task results
Description:	BetterGEDCOM shall be able to record information about, or link to records representing, the findings and results produced by the task (an overall description of the results, Excerpts, Multimedia, Citations, Filing Cabinet Reference)
Importance:
Why?:
Source:
Way forward?:	The information recorded for this requirement overlaps with the information in the Evidence and Conclusion model.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Admin08
Title:	Objectives - for grouping of tasks
Description:	BetterGEDCOM should be able to group several tasks into Objectives (Target) , each Objective representing a question to be answered or a problem to be solved. An objective is usually defined before the tasks needed to achieve the objective. Objectives should have a description and will be the record pointing to users, events, places etc rather than each task. Some elements of the information recorded for tasks (see above) can be defined for the objective rather than each task,
Importance:
Why?:	Questions and problems are in most cases the reasons that one or more tasks are performed.
Source:
Way forward?:	An objective record may contain elements of the info mentioned in Admin03
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Admin09
Title:	Projects - for grouping of objectives
Description:	BetterGEDCOM could be able to group several objectives into projects. Projects could be split into sub-projects. Each (sub-)project should have a name, elements of task progress listed above, completion grade (%) and description.
Importance:
Why?:
Source:
Way forward?:	A project record may contain elements of the info mentioned in Admin03
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Admin10
Title:	Correspondence log
Description:	BetterGEDCOM could be able to record information about letters, emails, phone calls or other correspondence related to the research. Item in the log can have a type (call, email etc), direction (in/out), researcher, correspondent, subject, date, reference to filing system and details about the correspondence. Contact information (address, phone etc) could also be recorded..
Importance:
Why?:
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Admin11
Title:	Researchers
Description:	BetterGEDCOM could be able to record information about the researchers using the program or other cooperating/corresponding researchers. Researchers can have a name, languages, registration number (?), notes, media (photo) and contact info. A researcher can be linked to a person in the database. The Gentech model also links researchers to assertions, i.e. who made the assertion.
Importance:
Why?:
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Admin12
Title:	Support Privacy Settings
Description:	Genealogy programs must support the user by providing controls/options/settings/reports to assist the user in maintaining the privacy of the people in the database, particularly those living.
Importance:	Mandatory
Why?:	Users will need to differentiate between data that can be shared or not. Users may need to differentiate between different data that be shared with different groups of people. For example, only the data related to one branch of a family would be shared with the people in that branch.
Source:	NGS Standards For Sharing Information With Others
Way forward?:
Dependencies:	The following requirements may impact or be impacted by this requirement: Conversion02 - Support for generating web pages Data06 - Transfer between one user's programs and to other users/services Data-Ind01 - Data about persons Data-Ind02 - Biological relations independent of family Data-Ind04 - Sex-change individuals Data-Place06 - Location to include address DNA01 - Results from DNA tests Multimedia02 - Information about multimedia objects Multimedia03 - References to Multimedia TextHandling03 - Footnotes/endnotes in notes This list may be incomplete as requirements evolve.
Approval status:
Proposer:	12 Jul 2011 Christine_E
Changes:
Discussion:	Discussion at Admin12 - Support Privacy Settings

Confidence and Accuracy

Id:	ConfAcc01 (Confidence and Accuracy) (was Data03)
Title	Support for approximately known values
Description:	BetterGEDCOM must allow the recording of approximately known values in all appropriate contexts.
Importance:	Mandatory
Why?:	GEDCOM already allows dates to be "about yyyy". Note - this is not the same as assigning a probability to a value - e.g. "Probably 1812" is not the same as "About 1812", and this requirement is not intended to cover concepts like "Probably 1812".
Source:	Tom Wetmore's Goal and Requirements plus various discussion pages.
Way forward?:	See Data-Date01 for this requirement on dates. See Data-Place01 for this requirement on locations. Work on the data model needs to establish if there are any other values that either need or would benefit from, the ability to record approximation.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Note that in this Catalogue, we use the term "Characteristic" to refer to what have been referred to as properties, facts, attributes, characteristics or traits. See PFACT in Glossary. Again, my use of this term does not imply that it should be the term used in the Data Model.

Id:	ConfAcc02 (was Data04)
Title:	Levels of Confidence in Database Conclusions
Description:	BetterGEDCOM should allow the recording of recognized levels of confidence associated with database conclusions
Importance:	Very Desirable
Why?:	Supports faithful recording of research status and results. This uncertainty / level of confidence can apply to various sub-items, including, but not necessarily restricted to, dates ("Probably 1812"), places ("Likely London, England") and relationships ("Possible father is ...").
Source:	_Evidence Explained_, 2007, p. 19, "certainly," "probably," "possibly," "likely," and "apparently," "perhaps"
Way forward?:
Dependencies:
Approval status:
Proposer:	GeneJ
Changes:	2011 Feb 21 - created 2011 Mar 21 - add examples to "Why".
Discussion:	Discussion at Data04 - Levels of Confidence in Database Conclusions

Id:	ConfAcc03 (was Data05)
Title:	Universal Qualifier Symbol ("?")
Description:	BetterGEDCOM should incorporate methods allowing users to apply the universal qualifier "?" before dates (or parts of dates), locations, names, etc.
Importance:	Very Desirable
Why?:	Supports faithful recording of research status and results.
Source:	Hoff and Leclerc, _Genealogical Writing in the 21st Century_ (2006), p. 115, "Commonly used Symbols," for "?" as, "uncertain interpretation of original text."
Way forward?:
Dependencies:
Approval status:
Proposer:	GeneJ
Changes:	2011 Feb 21 - created
Discussion:	Data05 - Universal Qualifier Symbol ("?")

Id:	ConfAcc04
Title:	Document Rejected Conclusions
Description:	BetterGEDCOM could allow the recording of rejected conclusions.
Importance:	Desirable.
Why?:	If a conclusion is rejected, it can be useful to record the rejected conclusion. This should help to stop the researcher revisiting their own mistakes in future, when they have forgotten previous research; Negative evidence can be useful in itself (e.g. "Thomas' mother was not Mary, so must have been Margaret or Molly"); Erroneous conclusions listed on the Internet are the bane of many genealogists' lives. It may be useful to have a refutation to hand.
Source:	Extension of Data04 "Levels of Confidence in Database Conclusions"
Way forward?:
Dependencies:	Data04 "Levels of Confidence in Database Conclusions"
Approval status:
Proposer:	AdrianB38, 2011 Mar 21
Changes:
Discussion:

Conversion

Id:	Conversion01
Description:	The coverage of the types of genealogical data must allow faithful import of data from all current, common genealogical software with no material manual intervention, subject to the limits of the applications involved.
Importance:	Mandatory
Why?:	If users cannot move their data to BetterGEDCOM formats, they will not use BG
Source:
Way forward?:	The data model for BetterGEDCOM must be rich enough to allow software companies to write routines to copy data from their internal file formats and / or their versions of GEDCOM to the BG format. Therefore, the BetterGEDCOM data model must include everything in the current GEDCOM data model - but not necessarily in the same format - e.g. in-line sources could be converted to source records.
Dependencies:	We are dependent on the software companies writing that conversion code.
Approval status:
Proposer:
Changes:
Discussion:

Conversion02 - removed by the submitter. See discussion at Conversion02 - Support for generating web pages

Data

Note - The prefix "Data" is used for generic requirements that do not appear to be obviously applicable to only one group.

Id:	Data01
Title	Compatibility with GEDCOM
Description:	The data model that underlies BetterGEDCOM must be a superset of the models used by existing major, genealogical applications to the fullest extent deemed possible during design
Importance:	Mandatory
Why?:	BG compatible software must be able to import data from existing applications and must be at least as good as existing applications in relation to its model.
Source:	Tom Wetmore's Goal and Requirements
Way forward?:	Produce a data model to do this.
Dependencies:
Approval status:
Proposer:
Changes:	2011 Nov 15; Adrian B; rename from "Backwards Compatability" to "Compatibility with GEDCOM" to avoid use of "Backwards" - I thought that was the correct direction but others don't, so I avoid the use of the term - and correct the spelling as well.
Discussion:

Id:	Data02
Title	Support for all conventional genealogical processes
Description:	The data model that underlies BetterGEDCOM must provide a set of data entities that will allow genealogical applications to support all conventional genealogical processes.
Importance:	Mandatory
Why?:	BG compatible software must be able to carry out normal processes
Source:	Tom Wetmore's Goal and Requirements
Way forward?:	Produce a data model to do this.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Data03, 04 and 05 has been moved to Confidence and Accuracy.

Id:	Data06 (was Usage01)
Title:	Transfer between one user's programs and to other users/services
Description:	BetterGEDCOM should support data that needs to be exchanged between 1) one user’s applications possibly from different vendors or 2) several user’s/service provider’s applications.
Importance:
Why?	The requirement in these cases may be different, but betterGEDCOM must support both. For example a program may support management/classification/grouping of collection of media, e.g. photos. The grouping may not be of interest to other users, but should be transferred when the user transfer media between her/his own programs. Another example genealogy project management information, eg. planed lookups in a source, that may not be of interest to other users – but should be possible to transfer to the user’s other programs. Thus, all info stored by a program is a candidate for exchange. Management data intended to be transferred between one user’s applications are not likely to be transferred to network services, and are thus not restricted by specifications that can be transferred to such services.
Source:
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposer:	gthorud
Changes:	22 Feb gthorud
Discussion:

Id:	Data07
Title:	Independent record collections
Description:	BetterGEDCOM shall be able to record eg. only records containing information about places, without any person records or other types of records.
Importance:
Why?:	Independent record collections allows exchange of collections of source meta info, source data, place info, media meta data, media, timelines etc. This could facilitate projects where user could collaborate to create such collections, without having to rely on network services or other parties to provide the necessary facilities.
Source:	(Originally proposed by Tom in some discussion I can’t find)
Way forward?:
Dependencies:
Approval status:
Proposer:	22 Feb gthorud
Changes:
Discussion:	Data07- Independent record collections

Data08 - unasigned, see the discussion at Data08 - Importing Data (Proposal)

Id:	Data09
Title:	Collections of source data
Description:	BetterGEDCOM could allow recording of data from sources as a collection of records where none or only some are linked to persons or other records in the BG-file. Examples are transcriptions of a complete source or a section in the source, e.g. births in a church book, images of same or an index to the source.
Importance:	To be determined
Why?:	Often such collections are published in databases on the internet, but there could be many reasons why that is not practical, e.g. there might not be a database suited for the type of data or there could be copyright issues. It should be possible to search for data in a collection. It could be possible to link records in a collection to persons etc., incl source meta data, in the BG-file. It would also allow the user to see which records in the collection that are not linked to a person, and thus also to see that a candidate record in a collection is already assigned - thus avoiding e.g. to assign the same birth record to two different persons.
Source:
Way forward?:	The solution must be general so that it can handle many types of sources. For structured data, some general data elements, that are common to many sources, could be defined - facilitating searches across collections - e.g.given names, surnames, date of birth, "place of residence", place of birth (or place of event). Data could also be non structured text or images. An alternative could be to encode such collections in terms of persons, places, and events, in separate sets of data (some current programs can convert tabular transcriptions into Gedcom format), or keep the data in table structures with user assigned column headers imported from e.g. spreadsheets, possibly in a two level structure - one for the record (event) and one for the persons. A solution could also be used to store individual source records downloaded from web-services (would require a standard download format) or simply records entered by the user. There are lots of alternatives.
Dependencies:	This is somewhat related to Data07
Approval status:
Proposer:	gthorud 9 May 2011 - as instructed by todays Developer meeting
Changes:
Discussion:	Discussion at Data09 - Collections of source data

Characteristic

Id:	Data-Char01 (Characteristic)
Title
Description:	BetterGEDCOM must support the recording of the characteristics of persons, families, groups, places, "ships" etc.
Importance:	Mandatory
Why?	Current GEDCOM allows the recording of attributes for individuals and families.
Source:	Various discussion pages.
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Data-Char02 (was Data09 for Characteristics)
Title	Record locations for characteristics
Description:	BetterGEDCOM must support the recording of location values applicable to all characteristics of persons, families, groups, places, "ships" etc. (unless specifically agreed otherwise).
Importance:	Mandatory
Why?	Current GEDCOM allows the recording of place for attributes. Note this does not imply that the recording of a location against any particular characteristic makes sense - e.g. recording of a location against someone's sex would seem pointless. On the other hand, recording of a location against someone's name might well be useful - if someone emigrated under an assumed name, it might be useful to record USA (e.g.) against their new name, and England against their old.
Source:	Various discussion pages.
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposer:	AdrianB38
Changes:	2011 Mar 03 - split off requirement that location goes down to address to make it more obvious - raise new Requirement Data-Place06 for it.
Discussion:

Id:	Data-Char03
Title	Multiple Events & Characteristics etc
Description:	BetterGEDCOM must allow multiple characteristics and events of the same type against each person, family, group, place, "ship" etc.. In particular, it must be possible to allow multiple birth and death dates against individuals.
Importance:	Mandatory
Why?	Most applications allow multiple characteristics, occupations for instance, against an individual. Some applications allow multiple birth and death dates against an individual. The normal meaning of this is that these are alternatives. It must be possible to convert such data to BetterGEDCOM format. As GEDCOM v5.5 allows multiple events and multiple attributes, including multiple birth-dates, this requirement is also mandated by the need to allow GEDCOM compatible data to be represented in BetterGEDCOM form..
Source:	Various discussion pages.
Way forward?:	Identify any events and attributes in GEDCOM that are currently only allowed to have one occurrence and decide what to do about these - with the exception of SEX, a first glance at GEDCOM 5.5 suggests the single occurrence items for the Individual are internal to the GEDCOM structure, rather than relating to their family history and genealogy and thus it may be appropriate for them to remain as single occurrence items. Depending on the conclusions above, create this in the Data Model.
Dependencies:
Approval status:
Proposer:
Changes:	2011 March 05 AdrianB38 - split off sex-change to Data-Ind04 2011 March 05 AdrianB38 - add clarification that this is also a compatibility requirement.
Discussion:

Id:	Data-Char04
Title	Date all Characteristics
Description:	BetterGEDCOM must allow the recording of dates against all characteristics of each person, family, group, place, "ship". In particular, it must be possible to allow dates against an individual's name characteristics.
Importance:	Mandatory
Why?	While GEDCOM currently allows multiple names against individuals, there is no ability to record a date against each name, implying that the names are used at the same time. This may or may not be true. Allowing dating of names allows more precise description of married names, for instances.
Source:	Various discussion pages.
Way forward?:	Create this in the Data Model.
Dependencies:	Data-Char03
Approval status:
Proposer:
Changes:	2011 March 05 AdrianB38 - add title
Discussion:

Date

Id:	Data-Date01 (was date part of Data03)
Title	Approximately known dates
Description:	BetterGEDCOM must allow the recording of approximately known dates. This requirement refers to approximation of single dates only. The date in question may be represented as a single month or a single year.
Importance:	Mandatory
Why?:	GEDCOM already allows dates to be "about yyyy". Note - this is not the same as assigning a probability to a value - e.g. "Probably 1812" is not the same as "About 1812", and this requirement is not intended to cover concepts like "Probably 1812". See also Data03
Source:	Tom Wetmore's Goal and Requirements plus various discussion pages. DeadEnds Date Formats Dicussion of dates in the DeadEnds data model.
Way forward?:	Note that, for the purposes of clarity, date periods are covered by requirement "Data-Date04 Date Periods" and date ranges are covered by requirement "Data-Date05 Date Ranges". Include this in the Data Model. The existing GEDCOM options to be covered by this requirement are logically equivalent to the following phrases: ABOUT date ESTIMATED date CALCULATED date See pp39 & 40 in in GEDCOM Standard 5.5 for the meanings. No other options or meanings have yet been identified.
Dependencies:
Approv. status:
Proposer:
Changes:	14 May 2011 - added rows to requirement table; added discussion and link (GeneJ) 15 May 2011 - explicitly exclude the GEDCOM usages of date period and date range from this. It is acknowledged that some interpretations of "approximate" could cover date ranges, so we make it clear what the interpretations are.
Discussion:	Discussion at Data-Date01 (was date part of Data03) / Approximately known dates

Id:	Data-Date02
Title	Calendars
Description:	A BetterGEDCOM file must define the calendar to be used for each date stored in the file. This definition should be accompanied by a definition of the ordering of the date items within the date (e.g. year/month/day or day/month/year or month/day/year or ...)
Importance:	Mandatory
Why?:	Dates may occur in source documents in all sorts of calendar representations. It is desirable that the codified representation of that should differ as little as possible from the written characters in the source, to reduce the scope for error in input or output. Therefore, BetterGEDCOM needs to accommodate Jewish, Muslim, Chinese, etc, calendars, Julian or Gregorian calendars by country (e.g. with France and England on Gregorian and Julian calendars respectively(?) the two countries did not use the same day/month for "today"); French Revolutionary calendars, etc. Sometimes a date is just a text string.
Source:	Dicussion of dates in the DeadEnds data model
Way forward?:	Create this in the Data Model. To be decided: Whether Data Model includes a facility for defining a default calendar and date-item ordering, or whether every date must be marked up with these items. If the latter option is chosen, this relies on intelligent application design to reduce user workload. Note also - there is an assumption here that dates will be stored in various calendars and not as (e.g.) number of days since an agreed event.
Dependencies:
Approval status:
Proposer:
Changes:	2011 Feb 22 15:22 CET - alter description to "must" to match "mandatory" importance. 2011 Feb 22 15:43 CET - add to "Way Forward" comments about possible default calendar and assumption that dates will be stored in calendar form 2011 Feb 22 21:05 CET - alter description to separate definition of calendar itself from the ordering of the date items as these are 2 concepts. Also attempt to clarify Way Forward re defaults.
Discussion:	Data-Date02 modified

Id:	Data-Date03
Title:	Date phrases
Description:	BetterGEDCOM must allow a "date" to be entered as a phrase where the values are not recognizable to a date parser, but which gives a human reader information about when an event occurred. It must allow such a phrase to have an optional date in parseable format that can be used to interpret the phrase.
Importance:	Mandatory
Why?:	1. GEDCOM Standard 5.5 includes these two as DATE_PHRASE and INT <DATE> (<DATE_PHRASE>) 2. A phrase may give time-relative information even if a date is not known or not known well - e.g. "at the Battle of Brunanburh" is more informative than "between 934 and 939"; or "on a Tuesday in the spring of 1873" can be interpreted as 1873 but the words are informative.
Source:	GEDCOM Standard 5.5 Dicussion of dates in the DeadEnds data model.
Way forward?:	Include this in the Data Model
Dependencies:
Approval status:
Proposer:	AdrianB38
Changes:
Discussion:	Data-Date03 Date Phrases

Id:	Data-Date04
Title	Date periods
Description:	BetterGEDCOM must allow the recording of periods of time, denoted by start and / or end dates. BetterGEDCOM must explicitly define whether or not the end or start date is included in the period of time.
Importance:	Mandatory
Why?:	GEDCOM already allows this. Failure to include will result in failure to convert the vast majority of GEDCOM based files.
Source:	GEDCOM Standard 5.5, page 41
Way forward?:	Include this in the data model. GEDCOM options are logically equivalent to the following phrases: FROM date TO date FROM date-1 TO date-2 where date, date-1 and date-2 are known, unqualified dates - i.e. "FROM ABOUT 1066" is not included as the ABOUT is not permitted in this requirement. It is suggested that the end or start date are included in the period of time as this is normal usage in the English language - e.g. "The First World War lasted FROM 1914 TO 1918" - 1914 and 1918 are included in the War's period. The start or end date may be expressed as a Date Phrase, e.g. "FROM The marriage of Fred and Gladys Pugh"
Dependencies:
Approval status:
Proposer:	AdrianB38
Changes:	2011 May 15 - include this as a separate requirement, to make it explicit what Data-Date01 is concerned about.
Discussion:	Data-Date04 - Date periods

Id:	Data-Date05
Title	Date ranges
Description:	BetterGEDCOM must allow the recording of ranges of time, denoted by start and / or end dates, within which an event takes place.That event may take place on a single day, or it may take place over a period of days. BetterGEDCOM must explicitly define whether or not the end or start date is included in the range of time.
Importance:	Mandatory
Why?:	GEDCOM already allows this. Failure to include will result in failure to convert the vast majority of GEDCOM based files.
Source:	GEDCOM Standard 5.5, page 41
Way forward?:	Include this in the data model. GEDCOM options are logically equivalent to the following phrases: BEFORE date AFTER date BETWEEN date-1 AND date-2 where date, date-1 and date-2 are known, unqualified dates - i.e. "AFTER ABOUT 1066" is not included as the ABOUT is not permitted in this requirement. It is suggested that the end or start date are included in the range of time as this is the clear implication of page 42 in GEDCOM Standard 5.5, which explicitly states that: 1852 is equivalent and interchangeable with BETWEEN 1 JANUARY 1852 AND 31 DECEMBER 1852 The start or end date may be expressed as a Date Phrase, e.g. "AFTER The Fall of the Roman Empire"
Dependencies:
Approval status:
Proposer:	AdrianB38
Changes:	2011 May 15 - include this as a separate requirement, to make it explicit what Data-Date01 is concerned about.
Discussion:	Data-Date05 - Date Ranges

Id:	Data-Date06
Title	Approximate Date periods and ranges
Description:	BetterGEDCOM could allow ranges and periods of time, to be denoted denoted by approximate start and / or end dates.
Importance:	Desirable
Why?:	Many events cannot be located precisely in time - even the start and end dates can be unclear. GEDCOM has no means of expressing this. For instance, "The Dark Ages lasted FROM ABOUT 410 TO ABOUT 1066 in England" (no arguments about the truth of that please!)
Source:
Way forward?:	Include this in the data model.
Dependencies:	Data-Date01 Approximately known dates Data-Date04 Date periods Data-Date05 Date ranges
Approval status:
Proposer:	AdrianB38
Changes:	2011 May 15 - new requirement
Discussion:	Data-Date06 - Approximate Date periods and ranges

Event

Id:	Data-Event01 (was Data10)
Title:	Events with multiple people, with roles
Description:	BetterGEDCOM must support the recording of events that affect multiple people. In particular, it must be possible to record the role of each person in the event.
Importance:	Mandatory
Why?	Events do affect multiple people. Current GEDCOM has almost no ability to record multi-person events, excepting perhaps births and adoptions. However, the parents of a birth in GEDCOM are usually implied by the parents of the appropriate family, creating potential issues when that family is an adoptive one. It would be better to have a birth event involving three people (e.g. child and two biological parents typically), with this data separate from the family.
Source:	Various discussion pages. A typical item in many other post-GEDCOM 5.5 proposals.
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposer:
Changes:	AdrianB38 - 2011 Apr 17 - Add link to new discussion to record things we're liable to forget.
Discussion	Discussion at Data-Event01 - Events with multiple people, with roles

Id:	Data-Event02 (was Data09)
Title:	Multiple places per event
Description:	BetterGEDCOM should support the recording of multiple places for a single event.
Importance:	Very desirable
Why?	Current GEDCOM allows the recording of one place for events. There are application extensions to record more than one - e.g. FamilyHistorian records two places for emigration - a "from" and a "to" place. Users may also define "Journey" events, where a "from" and a "to" location would seem natural.
Source:	Various discussion pages. Qualifying Locations for Events
Way forward?:	Analyse whether there is a need for more than two places per event - e.g. "from", "to", "via"; Analyse whether location-roles are mandatory, optional or forbidden. (Location-roles refers to the role that a location plays in an event. Examples of roles are "from" and "to". Locations without roles would be just listed, e.g. "The 1906 earthquake happened at X and Y") If roles are needed - what are the roles? Create this in the Data Model.
Dependencies:
Approval status:
Proposer:
Changes:	AdrianB38 - 2011 Mar 22 - make explicit this is multiple places for one event AdrianB38 - 2011 Mar 24 - Clarify options for roles or not in 2nd bullet of "Way Forward"; Remove "Way Forward" bullet "Should multiple place events be listed?" as this is ambiguous and covered by 2nd bullet
Discussion:	Discussion on Multiple Places per event

Id:	Data-Event03
Title:	Central registry of event types (and possibly other types)
Description:	BetterGEDCOM should create a central registry of event types that are not defined in the main standard. The registry shall be updated more frequently than the main standard. It could potentially contain types used in structures containing non-standard type and value pairs. A procedure (rules) must be defined for maintenance of the registry. The information registered for event types (and other types) must be specified (eg. type name, definition, roles, event value types).
Importance:
Why?:
Source:	Custom GEDCOM Tags
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Data-Event04
Title:	Events over a time-period
Description:	BetterGEDCOM must define what an Event is and must allow an Event to take place over a time-period of more than one day.
Importance:	Mandatory
Why?:	Current GEDCOM allows an Event to last for more than 1 day. Hence there will be many GEDCOM files containing such Events. Page 35 of the (draft) GEDCOM v5.5.1 standard says: "As a general rule, events are things that happen on a specific date. Use the date form ‘BET date AND date’ to indicate that an event took place at some time between two dates. Resist the temptation to use a ‘FROM date TO date’ form in an event structure. If the subject of your recording occurred over a period of time, then it is probably not an event, but rather an attribute or fact." This can give the impression that events are only things that happen on a specific date.However, even this wording specifically allows events occurring over a period of time. For clarity, the BetterGEDCOM standard must make it clear that events can occur over a period of time.
Source:	See discussion Syntax09 Define Event vs. Attribute. This discussion was primarily about distinguishing the difference between Events and Attributes if necessary. In there were various postings about the definition of an Event and whether it could last over several days or not. Those discussions stand independent of the differences between events and attributes.
Way forward?:	Define the event entity thus in the Data Model. Note the proposed definition in Syntax09 Define Event vs. Attribute that an event is something leading to a change - this might be a useful definition.
Dependencies:	Data-Event01 "Events with multiple people, with roles" and Data-Event02 "Multiple places per event" will influence the way forward on this.
Approval status:
Proposer:	AdrianB38 2011 Mar 25
Changes:
Discussion:

Id:	Data-Event05
Title:	Event Classes
Description:	BetterGEDCOM should define Event Classes grouping similar events into one class describing common rules for how the data recorded about events should be handled by programs. One example is a "Marriage event class" (this name may be changed) that would contain events such as Marriage, Civil Marriage, Cohabitation start, Partnership and other events that describes a union between two persons - all these events should be treated by programs as they currently handle marriage, although with different terms.
Importance:
Why?:	The purpose is to allow new events to be defined in the standard, a registry (see Data-Event03) or by users that will be handled by applications according to rules defined by the class that the event belongs to. The rules may be simple, just saying that the events shall be handled in the same way, when a new type of event is defined to be in the same class as a well established event type. Classes will be used when the event requires a more specialised handling than can be handled by a sentence template, e.g. when marriage events are placed in special paragraphs in reports - or depending on how data about families will be recorded in BetterGEDCOM, the event could be the basis for establishment of a data structure in the program representing a family.
Source:	This has been discussed in Data-Fam02 and in ?? (earlier discussions?) "I Want My Genealogy Software And BetterGEDCOM To Do This" on Shortcomings of GEDCOM
Way forward?:	The possible types of classes should be identified and populated with an initial set of events. The initial reason to do this is to verify that there is a need for classes. Rules should be defined for each class.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Data-Event06
Title:	Events as separate records
Description:	BetterGEDCOM must allow other records to reference events. Thus events should be recorded as separate records.
Importance:
Why?:	There is a need for other records to reference an event, for example from structures recording administrative information. Also, since we will have multiple persons participating in an event, the event should not be stored in the record of just one of those persons.
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:	17 March 2011 gthorud
Changes:
Discussion:

Id:	Data-Event07
Title:	Person names per event
Description:	BetterGEDCOM must allow the recording of the name used by (recorded for) a person playing a role in an event.
Importance:	High
Why	A person may have used different names during his/her life, or one name may be recorded with different spellings in documents. For example, in many countries, people used the name of the farm where they were living, as a surname, so they would change their name when they moved to a new farm. It is best to record that name in the context of an event so it can be presented in the right context, rather than having a simple list of names as in current Gedcom.
Source:
Way forward?:
Dependencies:	Data-Event01
Approval status:
Proposer:	8 June 2011 gthorud
Changes:
Discussion:

Again, acknowledgements are made to the Volere Requirements Shell template in "Mastering the Requirements Process" by Robertson & Robertson

Family

Id:	Data-Fam01 (was Data06)
Title	Families independent of biological relations
Description:	BetterGEDCOM must support the recording of genealogy / family history data about the family as a (possibly informal) social grouping, independent of any biological relationship or legal adoptions.
Importance:	Mandatory
Why?	Family units exist where there is no underlying biological relationship and no legal adoptions. Biological relationships exist where there is no family in any meaningful sense. Existing GEDCOM files may contain data (possibly user-defined tags) recorded about the social grouping of the family, which must be carried forward on conversion to BetterGEDCOM format. Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:	GEDCOM does this. Various discussion pages.
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:	Data-Fam01 Family as a Social Grouping

Id:	Data-Fam02
Title:	Cohabitants
Description:	BetterGEDCOM must support the recording of information about cohabitants, with or without, common children. Cohabitants should be treated in the same way as married couples, and there should be events for the establishment and dissolution of "cohabintants". Some couples may start out as cohabitants and then marry.
Importance:
Why?:	The percentage of couples that are cohabitants is increasing in the western world, in some countries it is as high as 25-30%. BetterGEDCOM should not discriminate people in such relations.
Source:
Way forward?:	Depends on how BG implements relations/families in general. It may be sufficient with event types similar to marriage and divorce.
Dependencies:
Approval status:
Proposer:	26 March 2011 gthorud
Changes:
Discussion:	Discussion at Data-Fam02 - Cohabitants

Group

Id:	Data-Group01 (was Data05)
Title:	Data about groups of persons (eg. organisations)
Description:	BetterGEDCOM must support the recording of historic data about groups of persons, such as organisations, companies, regiments.
Importance:	Mandatory
Why?:	Organisations, companies, regiments, etc, have a major impact on individuals, yet no mechanism currently exists in GEDCOM to record any of their details in a structured manner, nor to link organisation data to people. Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:	Shortcomings of GEDCOM
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Person (was Individual)

Id:	Data-Ind01 (was Data04)
Title:	Data about persons
Description:	BetterGEDCOM must support the recording of genealogy / family history data about persons.
Importance:	Mandatory
Why?:	Statement of the obvious. Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:	GEDCOM does this.
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Data-Ind02 (was Data07)
Title:	Biological relations independent of family
Description:	BetterGEDCOM must support the recording of biological relationships independent of any family grouping. Biological relationships must include surrogacy, etc.
Importance:	Mandatory
Why?	Biological relationships can exist where there is no family in any meaningful sense. Existing GEDCOM files create a family for biological relationships. This is not always appropriate. Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:	Various discussion pages.
Way forward?:	Create this in the Data Model.
Dependencies:
Approv. status:
Proposer:
Changes:
Discussion:	Data-Ind02 Biological rel'ns indep of family

Id:	Data-Ind03
Title:	Non-biological, non-family relationships
Description:	BetterGEDCOM must provide a means to document relationships between individuals that are not based on biology or family, e.g. "X is the friend of Y".
Importance:	Mandatory
Why?:	GEDCOM has the ASSO tag in the ASSOCIATION_STRUCTURE (see GEDCOM 5.5) that may be used to document such relationship as god-parent, friendship, etc. Also the ALIA tag can be used to link individuals' records, when the individual are suspected to be the same person.
Source:	GEDCOM Standard version 5.5 ASSOCIATION_STRUCTURE and ALIA tag Tom Wetmore 2011 Feb 27 Syntax09 Define Event vs. Attribute discussion
Way forward?:	Comments on possible ways forward
Dependencies:
Approval status:
Proposer:	AdrianB38 2011 Feb 27
Changes:
Discussion:	Discussion

Id:	Data-Ind04
Title	Sex-change individuals
Description:	BetterGEDCOM should support the recording of sex-changes for individuals.
Importance:	Very desirable
Why?	There are individuals who have gone through a sex-change. BetterGEDCOM should be able to describe their history accurately, as it does anyone else.
Source:	Various discussion pages.
Way forward?:	Need to agree on what values are required - is male / female enough? Is there a need to consider not just sex (the biological and physiological characteristics) but also gender (the social construct)?
Dependencies:
Approval status:
Proposer:	AdrianB38 2011 March 05
Changes:
Discussion:

Person Names

Id:	Data-PersonNames01
Title:	Sorting on multiple given names and surnames
Description:	BetterGEDCOM shall provide a way to identify parts of names (whole words or parts of words) that shall be used for sorting, identifying if the part should sort as a given name or surname. It shall allow several such surname parts and could allow several given name parts. A priority could be assigned the name parts sorting as surnames. All this information related to sorting is a suggestion to the recipient for how name parts should be sorted.
Importance:	Very desirable
Why?:	Many cultures operate with several surnames. It should be possible to sort on those names in indexes etc. The same applies to given names (forenames) because a person may be known by any one of those given names. Some words in a name (eg. prefixes) are not used for sorting, and often the beginning of a name is not used for sorting (d’ in d’Hondt) (Honda should sort before d'Hondt), or one “word” may sort as two names eg. both Berg and Olsen in Berg-Olsen. When there are several surnames, some countries consider the last surname to be most "significant" while others considers the first to be the most significant. Identification of these parts have no influence on how a name is printed in reports or charts. The need to sort on several given names could be discussed, also the priority of surnames. Important: For example, a middle name could indicated to be sorted as a given name or surname, but that does not imply that it is classified as a given name or surname in other contexts, and this proposal does not imply anything about any need to classify name parts as middle name, patronymic etc (which there may perhaps not be a need for).
Source:	Page: Person-Name+Elements Discussion: message/view/Person-Name+Elements/30777083 External Gramps page: http://gramps-project.org/wiki/index.php?title=GEPS_021:_Additional_Name_Fields
Way forward?:	A program could offer separate fields for the entry of these parts or use special notation.
Dependencies:
Approval status:
Proposer:	gthorud 25 Feb
Changes:
Discussion:	Discussion at Data-PersonNames01 - Sorting on multiple given names and surnames

Place

Id:	Data-Place01 (was location part of Data03)
Title	Approximately known locations
Description:	BetterGEDCOM must allow the recording of approximately known locations.
Importance:	Mandatory
Why?:	GEDCOM already allows dates to be "about yyyy". Locations may also be equally inexact, e.g. "at sea between England and Australia". Note - this is not the same as assigning a probability to a value - e.g. "Probably London" is not the same as "Near London" and this requirement is not intended to cover concepts like "Probably London". See also Data03
Source:	Tom Wetmore's Goal and Requirements plus various discussion pages.
Way forward?:
Dependencies:
Approval status:
Proposer:	AdrianB38
Changes:	2011 Mar 22 AdrianB38 - rename from "Approximate known approximate locations" to "Approximately known locations"
Discussion:

Id:	Data-Place02 (was Data08)
Title	Recording of structured data about locations
Description:	BetterGEDCOM should support the recording of structured, historic data about locations, for example multiple names, default prepositions for names, photos, maps, sources and links for access to geographic information services.
Importance:	Very desirable
Why?	Current GEDCOM does not even recognise "Place" as an entity - there is a rich amount of information about places over time, much of which will affect people.
Source:	"GEDCOM Won't Transfer This" on Shortcomings of GEDCOM
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposal:
Changes:
Discussion:

Id:	Data-Place03
Title:	A place can be member of several place hierarchies
Description:	BetterGEDCOM should support the recording of places of various types as members of several hierarchies of places (locations), possibly changing hierarchies over time, and possibly with surety assigned to the relation to a higher place – in a way where the path through the hierarchy to the top is unambigously identified for each place name.
Importance:	Very desirable
Why?	Gedcom supports hierarchies of names in events, but does not link these names and hierarchies unambiguously to place entities. This is not sufficient to describe the facts of history related to a place.
Source:	“tracking land changes idea” discussion and the Location entity page
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Data-Place04
Title:	Merging and/or splitting of places/locations
Description:	BetterGEDCOM shall be able to record identifiers of the place(s) that was split and/or merged when a place (location/property/region) was created.
Importance:
Why?:	The origin of a place is an important information about a place, and may in many cases provide evidence about relations between persons.
Source:	message/view/Location+entity/30888227 message/view/Location+entity/30668879?o=40
Way forward?:	The info should preferably be recorder by an event referencing the involved placeS, also giving date and source but possibly no persons.
Dependencies:
Approval status:
Proposer:	22 Feb gthorud
Changes:
Discussion:

Id:	Data-Place05
Title:	Place identifiers
Description:	BetterGEDCOM shall be able to record identifiers, possibly multipart/hierarchical, for a place used for example in land records, map databases, property owner databases, statistics. The identifier type should accompany each identifier part, i.e. a sequence of type/value pairs.
Importance:
Why?:	The identifier can be used to locate and lookup in various paper sources and , and is also in itself a historic fact. An identifier is often unique where a name is not. Several identifiers may have been used over time.
Source:	message/view/Location+entity/30668879?o=40
Way forward?:
Dependencies:
Approval status:
Proposer:	22 Feb gthorud
Changes:
Discussion:

Id:	Data-Place06
Title	Location to include address
Description:	The location in BetterGEDCOM should be able to specify an individual address.
Importance:	Very desirable
Why?	Current GEDCOM5.5 defines a PLACE as a "jurisdictional name to identify the place or location of an event". The address of an individual building is generally not regarded as being a PLACE under this definition. Since many events are known to occur at precise addresses, the address details are kept separately in the ADDRESS_STRUCTURE. This structure, however, repeats items like city, state, country. To avoid duplication and the consequent danger of values not being correctly duplicated, the successor to PLACE should include the ability to specify an individual address.
Source:	Various discussion pages.
Way forward?:	Create this in the Data Model. To be decided - whether a location's details in BetterGEDCOM should include Postal Code or Phone Number, which are also part of ADDRESS_STRUCTURE of GEDCOM 5.5, but appear to have dubious relevance to historical events or characteristics. Note this does not mean that the ADDRESS_STRUCTURE of GEDCOM 5.5 has no future in BetterGEDCOM, since the address of a repository, for instance, does not need to have the same structure as a location for historic events or characteristics.
Dependencies:
Approval status:
Proposer	AdrianB38
Changes:
Discussion:	2011 Mar 03 - Created. Split off from Data-Char02 the requirement that location goes down to address to make it more obvious

"Ship"

Id:	Data-Ship01 (was Data11)
Title	Data about miscellaneous entities
Description:	BetterGEDCOM could support the recording of historic data about miscellaneous entities or artefacts such as ships, locomotive types, etc.
Importance:	Desirable
Why?:	Individuals, organisations, etc., are usually involved with many physical artefacts, yet no mechanism currently exists in GEDCOM to record any of the artefact's details in a structured manner, nor to link these things to people, etc. Examples could include a summary of the history of a ship used for several cross-Atlantic journeys by different people. These details could be entered in one place, not against each person. Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:	Shortcomings of GEDCOM
Way forward?:	Create this in the Data Model.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:	Discussion of Data-Ship01 Data about miscellaneous entities

DNA

Id:	DNA01
Title:	Results from DNA tests
Description:	BetterGEDCOM should be able to record results of DNA tests.
Importance:
Why?:	Many genealogy programs allow recording of such data.
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:	19 March 2011 GeneJ
Changes:
Discussion:

To do - add sources, repositories, etc.

Evidence

Id:	Evidence01
Title	Evidence & Conclusion Model
Description:	BetterGEDCOM could handle evidence and not just conclusions
Importance:	Desirable
Why?:	Current GEDCOM is structured so that data about an individual or family is always the "latest working hypothesis". It is therefore difficult to identify the actual evidence, particularly when the "latest working hypothesis" is a composite of various bits of evidence. Also, in the event of discovery of an error, it can be difficult to (a) identify subsequent issues and (b) revert to an acceptable set of "working hypothesis". This is because adding new or revised conclusions to current GEDCOM is generally a destructive process resulting in the replacement or deleting of superseded conclusions. To overcome this, it appears as a minimum to be necessary to record evidence and conclusions separately. This allows adding new or revised conclusions to be a non-destructive process. See Evidence and Conclusion Process Note this requirement is effectively the same as (possibly part) adopting the "Evidence and Conclusion Model", which is linked to, but not the same as, the "Evidence and Conclusion Process". See Glossary
Source:	"I Want My Genealogy Software And BetterGEDCOM To Do This" on Shortcomings of GEDCOM
Way forward?:	Establish a first cut at a comprehensive set of genealogical processes that cover both Research Administration and recording of both Evidence & Conclusions. Define which parts of the processes are in the scope of Research Administration and which in that of Evidence & Conclusions Consider how the model and processes support "roll-back" to an acceptable state after discovery of an error. Consider feasibility and therefore the priorities of documenting (a) requirements and (b) the data model relating to Research Administration and Evidence & Conclusions and establish what is do-able in relation to timescales
Dependencies:
Approval status:
Proposer:
Changes:	2011 Feb 22 17:45 CET - attempt to clarify this is about the "Evidence and Conclusion Model", which is linked to, but not the same as, the "Evidence and Conclusion Process". 2011 April 11 17:00 CET - adjust "Way Forward" in light of discussions. Add distinction btw destructive process for adding new stuff in current GEDCOM, conclusion-only, and non-destructive process in Evidence & Conclusion
Discussion:	Evidence01 and Evidence 01 Please use the latter one. See also Defining E&C for BetterGEDCOM

Id:	Evidence02
Title:	Proof Argument and/or Process
Description:	BetterGEDCOM should support users need to record and share proof arguments supporting and/or supported by the evidence and conclusions therein recorded or shared.
Importance:	Very Desirable
Why?:	Supports faithful recording of research status and results.
Source:	http://www.bcgcertification.org/skillbuilders/skbld091.html
Way forward?:
Dependencies:
Approval status:
Proposer:	GeneJ
Changes:	2011 Feb 21 - created 2011 FEb 22 - Fixed URL for link to discussion (GJ) 2011 Feb 22 - Fixed keyboard witch's duplication in the description field above.
Discussion:	message/view/Better+GEDCOM+Requirements+Catalog/34594682

International

Id:	International01
Title	Support for international character sets
Description:	BetterGEDCOM must be able to handle text expressed in most of the world's writing systems
Importance:	Mandatory
Why?:	Genealogy is not confined to countries with the American-English 26 letter alphabet
Source:
Way forward?:	Unicode UTF-8
Dependencies:
Approval status:	See International02
Proposer:
Changes:
Discussion:

Id:	International02
Title	Unicode
Description:	BetterGEDCOM must use Unicode and only Unicode to represent text
Importance:	Mandatory
Why?:	Unicode is the universally accepted solution for handling the multitude of modern, historical and ancient character sets used by all human cultures. UTF-8 is the most common byte encoding of Unicode and supported by all modern software development environments
Source:
Way forward?:	Unicode UTF-8
Dependencies:	International01
Approval status:	Developers Meeting 17 Jan 2011 approved "Use Unicode (only) for the consistent encoding, representation and handling of text expressed in most of the world's writing systems" This is International01 plus International02 expressed in one sentence. Developers Meeting 31 Jan 2011 approved "Unicode character set in UTF-8 encoding, and optionally support other encoding schemes of Unicode "
Proposer:
Changes:
Discussion:

Id:	International03
Title	Support for the requirements of many cultures, countries, time periods and belief systems
Description:	BetterGEDCOM must support recording of information about real life in an open-ended set of cultures, countries, time periods and belief systems. It must not be biased towards any one of these.
Importance:	Mandatory
Why?:	BG must support (directly or indirectly) different calendars, events from different religions and cultures, etc.
Source:	Discussion topic "Goal 5 (Internationalization)"
Way forward?:	The BetterGEDCOM project cannot possibly understand all possible calendars, religions, etc. Therefore while we may be able to directly support the best known of them, we will have to cater for the rest indirectly by allowing software companies or users to extend BG to cope with them.
Dependencies:	This depends on Syntax04 and Syntax05 re extensibility
Approval status:	The 31st Jan 2011 Developers Meeting passed this: "Goal 5 BetterGEDCOM supports recording of information about real life in an open-ended set of cultures, countries, time periods and belief systems. It should not be biased towards any one of these."
Proposer:
Changes:
Discussion:

Multimedia

Id:	Multimedia01 (was Syntax02)
Title	Multimedia container
Description:	BetterGEDCOM must use a container specification to hold separate supporting files such as multimedia accompanying the genealogical data.With Multimedia we mean digital resources that may represent photos, scanned images, video, sound, documents, web pages, diagrams, maps, (database?) etc.
Importance:	Mandatory
Why?:	1. Embedded files within the genealogical data are generally viewed as a bad idea - they would have been rejected by GEDCOM in the next version after 5.5. 2. A weakness of current GEDCOM is that there is no standard method of transferring linked multimedia with the GEDCOM file, nor of maintaining the links to them after transfer.
Source:	Original Goal 2 bullet 3 Multimedia inclusion and referencing issues Importing Data
Way forward?:	Zip is probably in there somewhere
Dependencies:
Approval status:	Developers Meeting 17 Jan 2011 approved this
Proposal:
Changes:
Discussion:

Id:	Multimedia02
Title:	Information about multimedia objects
Description:	BetterGEDCOM must support the recording of information describing each multimedia object. Possible types of information include object encoding type (MIME?), origin/creator/author/publisher, (file) size, title, description, caption, creation time, identification of e.g. persons shown, type of “objects” shown in media (e.g. persons, landscapes, houses), copyright, informal/short identifier/name, setting (type of circumstances/event when created), user defined attributes and attribute types/flags, quality classification, creating program name&version, tags (incl. geo tags), research notes, duration – and more – or less.
Importance:	Mandatory
Why?:	This information is needed to select, organise and manipulate multimedia objects in genealogy programs and to provide information about the object when included in e.g. reports.
Source:	Multimedia inclusion and referencing issues
Way forward?:	The various types of information could be split into new requirements. The information should be held in an entity and top level record, possibly by supporting structures.
Dependencies:
Approval status:
Proposer:	March 4 2011 gthorud
Changes:
Discussion:

Id:	Multimedia03
Title:	References to Multimedia
Description:	BetterCEDCOM should allow information recorded about persons, families, groups, places, sources, events etc. to reference multimedia objects. The reference could contain information about the media's relevance in the referencing context. It would also be useful if classify the media in the referring context, eg. if the media is a preferred media or one or more classifications that could eg. be used to affect it's location in reports.
Importance:
Why?:	Information about the relevance in the referencing context could say for example "This is a photo of Peter together with his classmates in 1955". It could overrule similar information recorded about the photo for general use. The classification could allow some media to be printed above the text about a person and other media below, or in a scrapbook etc. but could also be used for other purposes - this is useful when transferred between one user's programs.
Source:	Multimedia inclusion and referencing issues
Way forward?:	The reference should most likely be to a multimedia entity/record containing information about the multimedia, see Multimedia02. It must be possible to reference multimedia in notes and excerpts.
Dependencies:
Approval status:
Proposer:	March 6 2011 gthorud.
Changes:
Discussion:

Id:	Multimedia04
Title:	Grouping of multimedia in a container
Description:	A container (see Multimedia01) shall be able to group the media in a tree structure possibly reflecting the directory structure on the exporting program's computer.
Importance:
Why?:	The structure is most likely useful to the receiver of the media.
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:	6 March 2011 gthorud
Changes:
Discussion:

Source

Id:	Source01
Title:	Information, Source and Evidence Type
Description:	BetterGEDCOM should record separately whether a Source is, for a given event or characteristic: Primary or Secondary Information (latter includes tertiary) Original or derivative source (e.g. paper or copy/digital image; document or compiled summary; document or transcribed version) Direct, indirect or negative evidence
Importance:	Very Desirable
Why?:	GEDCOM only has QUAY (quality) for this; QUAY is not a substitute for the specifics, as herein described.
Source:	Discussion page on Shortcomings of GEDCOM
Way forward?:	Include data items
Dependencies:
Approval status:
Proposer:	AdrianB38
Changes:	11 Mar 2011: Added Title (GJ)
Discussion:	Source01

Id:	Source02
Title	Certainty Assessment (QUAY)
Description:	BetterGEDCOM should record the qualitative degree of likelihood that a source is true for a given event or characteristic.
Importance:	Very Desirable
Why?:	GEDCOM has QUAY (quality) for this but the GEDCOM Standard is not clear what QUAY value should be assigned to a Primary source of Questionable accuracy
Source:	Discussion page on Shortcomings of GEDCOM
Way forward?:	Include data items
Dependencies:
Approval status:
Proposer:
Changes:	12 Mar 2011: Added Title (GJ)
Discussion:	Source 02-Certainty Assessment (QUAY)

Id:	Source03
Title	Sourcing of child / parent relationships
Description:	BetterGEDCOM must provide the ability to record the sources and citations to justify why a child is believed to be in a particular relationship with its (birth or whatever) parents
Importance:	Mandatory
Why?:	GEDCOM has no ability to do this. The current citations and sources are either for a family as a whole or for individual birth (or whatever) events that only mention the child.
Source:	GEDCOM Messes This Up on Shortcomings of GEDCOM
Way forward?:	Include data items Note the way forward may vary depending on the solutions chosen for Data-Fam01 and Data-Ind02 "Biological relations independent of family"
Dependencies:
Approval status:
Proposer:	AdrianB38
Changes:
Discussion:

Id:	Source04
Title:	Length of citations
Description:	There must be no limit in BetterGEDCOM on the length of a citation, whether that citation applies to a source (often expressed as part of a bibliography entry) or an event, attribute, person, relationship, etc, etc (often expressed as a footnote or end-note).
Importance:	Mandatory
Why?:	The majority of citations will be short. However, some users may wish to record a Proof Argument inside the citation. Any limit on the length of such a citation would be arbitrary and could be exceeded, so should not be permitted. See also requirement Syntax10 "No restrictions on item length or value", which is a generalised version of this requirement.
Source:	See discussion of "The Missing Link - a new entity type or a new type of source?" and specifically the discussion of the options for citations in there.
Way forward?:	While many users would never wish to use lengthy citations, there seems no good reason to forbid their use.
Dependencies:
Approval status:
Proposer:	AdrianB38
Changes:	Created 2001 April 17 15:50 CET
Discussion:	Discussion

Id:	Source05
Title:	Citations in notes
Description:	BetterGEDCOM should allow citations to be entered anywhere in in the text of notes.
Importance:
Why?:	For the same reason as footnotes are used in many texts to cite sources.
Source:
Way forward?:	One way to do it is to have separate records for citations.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Support for the standard

Id:	Support01
Title:	Support for multiple birth, death characteristics
Description:	Programs claiming support for BetterGEDCOM must support multiple birth, death characteristics for a person. Support means that the program must be able display the facts related to several occurencies of the characteristic, and allow recording of several such by the user.
Importance:
Why?:	See Char03
Source:
Way forward?:	The same requirement should be considered for other basic characteristics.
Dependencies:
Approval status:
Proposer:	6 March 2011 gthorud
Changes:
Discussion:

Syntax

Id:	Syntax01
Title	Underlying syntax
Description:	BetterGEDCOM's underlying syntax must be an existing, non-proprietary syntax
Importance:	Mandatory
Why?:	We do not want to reinvent the wheel
Source:	Original Goal 2 bullet 1
Way forward?:	Options include XML, JSON, GEDCOM, Google Protocol Buffers or any combination thereof.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Syntax02 has been moved to Multimedia01

Id:	Syntax03
Title	Content scope
Description:	The BetterGEDCOM file format must define data relating to the study of genealogy / family history.
Importance:	Mandatory
Why?:	Raison d'etre of the format - statement of the obvious. The coverage of BetterGEDCOM must be wider than existing formats in order to provide a reason for its adoption.
Source:	Original Goal 3
Way forward?:	Define the data in a Data Model etc.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Syntax04
Title	Extensibility by software companies
Description:	The BetterGEDCOM file format must be capable of extension by software companies. Extensions must be kept permanently separate from any later definitions in BetterGEDCOM format.
Importance:	Mandatory
Why?:	1. GEDCOM can be extended so to remove the facility would be a step backwards. 2. Many GEDCOM files exist with extensions.
Source:	Original Goal 3
Way forward?:	Note that extensions in GEDCOM are identified by an underscore, which applies only to extensions. Any new GEDCOM tags will not have the underscore so will not be confused with extensions. An equivalent mechanism needs to be used for BetterGEDCOM.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Syntax05
Title	User Extensibility of events and characteristics
Description:	The list of events, properties, characteristics, etc, of individuals, etc, in the BetterGEDCOM file format must be capable of extension by users. Extensions must be kept permanently separate from any later definitions in BetterGEDCOM format.
Importance:	Mandatory
Why?:	1. GEDCOM can be extended so to remove the facility would be a step backwards. 2. Many GEDCOM files exist with user-defined events.
Source:	Original Goal 3
Way forward?:	Note that user defined events and attributes in GEDCOM are identified by an underscore, which applies only to them. Any new GEDCOM tags will not have the underscore so will not be confused with user defined events, etc. An equivalent mechanism needs to be used for BetterGEDCOM.
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	Syntax06
Title:	Define one way of doing a thing
Description:	BetterGEDCOM should define just one way of doing one thing.
Importance:	Very Desirable
Why?:	More than one way may cause ambiguity and extra programming for programmers
Source:	Original Goal 7
Way forward?:	It may be sensible to agree specific exclusions to this requirement, e.g. for in-line notes and separate note records, where the extra programming work is trivial and does not create ambiguity.
Dependencies:	Issue 1: It is not always possible to agree that two things are, in reality, the same thing. For instance, whether or not in-line notes and separate note-records are, in practical terms, the same thing, has been the topic of debate. Issue 2: If two separate methods in GEDCOM type formats are merged into one, then it will not be possible to round-trip data from a GEDCOM type format to BG and back again coming up with the same data.
Approval status:
Proposer:
Changes:	2011 Feb 22 - Updated template format to add rows for title, proposer and discussion; added title, added link to discussion (also added discussion topic) (GJ)
Discussion:	Discussion at Syntax06 - Define one way of doing a thing Former discussion at Single way (current goal 7) [Please do not add comments to the former discussion]

Id:	Syntax07
Title	URIs (URLs) for external information
Description:	BetterGEDCOM format files must be able to contain URI (URL) addresses for external information
Importance:	Mandatory
Why?:	It is necessary for users to record to sources, etc on the Internet. Part of that data will be the URL.
Source:	Tom Wetmore's Goal and Requirements
Way forward?:
Dependencies:
Approval status:
Proposal:
Changes:
Discussion:	Syntax07 URIs (URLs) for external information

Id:	Syntax08
Title	Feature inheritance from previous event etc. types
Description:	It should be possible for user-defined events, properties, characteristics, etc, of individuals, etc, to inherit features from previously defined events, properties, characteristics, etc.
Importance:	Very desirable
Why?:	Events, properties, characteristics, etc. known to the application software may have logic built into the application to recognise them and process the data from them in certain ways. For instance, the "Marriage" event might be used by the application to propose a family to the user. User-defined events, properties, characteristics, etc., will not normally be recognised by the application so cannot have logic built into the application to recognise them. However, if the user-defined event, property, characteristic, etc., could inherit features belonging to one known to the application, then it would inherit that built-in logic. For instance, "Marriage - civil" might be a user-defined event that inherits details from "Marriage" and so would also be used by the application to propose a family to the user.
Source:	"I Want My Genealogy Software And BetterGEDCOM To Do This" on Shortcomings of GEDCOM
Way forward?:	If events etc are given a type and sub-type, then it would be possible for the user to create a user-defined subtype of an application defined type, and thus inherit the processing done for that type. For instance, an event "Marriage - civil" might have a type of "Marriage" and a subtype of "civil", thus automatically doing all processing created for the event-type of "Marriage"
Dependencies:	Syntax05 We depend on the application developers to create any processing that recognises events.
Approval status:
Proposer:
Changes:
Discussion:

Id:	Syntax09
Title:	Define Event vs. Attribute
Description:	Assuming that the BetterGEDCOM project distinguishes events from properties / facts / attributes / characteristics, then BetterGEDCOM must define and publish a clear definition of the difference between the two concepts that does not rely on a list of each. In particular, the definition must be clear enough for competent software suppliers and users to understand whether a new item is an event or a property / fact / attribute / characteristic.
Importance:	Mandatory
Why?:	There is no clear definition in the GEDCOM 5.5 specification of the difference between the two, only a list of events and a list of attributes. This means that a software supplier or user does not always know whether to create an event or attribute. As a result, the same concept can appear as both, resulting in difficulty of exchange of information.
Source:	Discussion on Custom GEDCOM tags Discussion: Eliminate Facts Discussion: Events, Properties, Characteristics and Facts
Way forward?:	If and when it becomes necessary to distinguish the two concepts, then the Data Model should be updated to record the definition.
Dependencies:
Approval status:
Proposer:	AdrianB38 2011 Feb 25 22:35
Changes:
Discussion:	Syntax09 Define Event vs. Attribute

Id:	Syntax10
Title	No restrictions on item length or value
Description:	Data items should have no length restriction in BetterGEDCOM, except as deemed necessary during design. Data items should have no restrictions on value in BetterGEDCOM, except as deemed necessary during design.
Importance:	Very Desirable
Why?:
Source:	Original Goal 2 bullet 5 Also Tom Wetmore's Goal and Requirements
Way forward?:	Compare TextHandling02 "No restrictions on line length", which refers to the overall length of a line.
Dependencies:
Approval status:	Subject of Survey Monkey - relevance? result?
Proposer:
Changes:
Discussion:

Id:	Syntax11
Title:	Unique Identifiers
Description:	BetterGEDCOM should assign unique identifiers (UIDs) to records, BG-files and "data sets". Data sets (the term could be changed) is a collection of data that may hold infomation about e.g. "The Olsen family", "Persons in parish X" or "Our genealogy project" that will be updated over time, and be exported in a BG file at (i)regular intervals. The data set will have a unique identifier, and so will each BGfile containing a snapshot of the data set.
Importance:
Why?:	The various purposes that UIDs could serve must be more pricisely defined. Also the procedures for their assignement and their use.
Source:	This has been discussed in Data08 and UUIDs - No thanks and Please lets use UUIDS ... and several other discussions (search for UUID).
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Task01 has been moved to Admin02.

Test

Id:	TestSuite01
Title:	Suite of test data
Description:	BetterGEDCOM should provide a test suite of data that will allow software suppliers to assess compliance of their software help them to diagnose issues assist them to resolve issues.
Importance:	Very Desirable
Why?:	If we can't do it, others will - and probably get it wrong. This will also meet developers halfway.
Source:	Original Goal 4 (I need to check up subsequent discussions on this)
Way forward?:
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:	Test Data Format

Text Handling

Id:	TextHandling01
Title	Formatting mark-up for text
Description:	BetterGEDCOM should define a method of marking up text with formatting information. It should be available in all appropriate fields
Importance:	Very Desirable
Why?:	This is a consistent request - the ability to format notes with italics, bold, etc.
Source:	Original Goal 2 bullet 4
Way forward?:	Allowing selected HTML or HTML-style tags?
Dependencies:
Approval status:
Proposer:
Changes:
Discussion:

Id:	TextHandling02
Title:	No restriction on line length
Description:	Lines should have no length restriction in BetterGEDCOM, except as deemed necessary during design.
Importance:	Very Desirable
Why?:
Source:	Original Goal 2 bullet 5 Also Tom Wetmore's Goal and Requirements
Way forward?:	Compare Syntax10 (was TextHandling03) "No restrictions on item length", which refers to the length of an individual item.
Dependencies:
Approval status:	Subject of Survey Monkey - result?
Proposer:
Changes:
Discussion:

Id:	TextHandling03
Title:	Footnotes/endnotes in notes
Description:	BetterGEDCOM should allow references to footnotes or endnotes that contains just text (not a source citation).
Importance:	Very Desirable
Why?:	Such footnotes/endnotes could contain comments or other text that may not be considered important enough to be entered in the note itself. See also Citations in Notes - Source05.
Source:
Way forward?:	The text could surrounded by special codes in the note text, or be contained in a separate structure.
Dependencies:
Approval status:
Proposer:	17 April 2011 gthorud
Changes:
Discussion:

Id:	TextHandling04
Title:	Semantic Mark-up in Text
Description:	TextHandling01 describes 'presentational mark-up' in text. This item supplements that with 'semantic markup' in text. This allows references to other entities (e.g. Persons, Places, Events, etc) to be embedded in text. Reference should be made to STEMMA's research document on Structured Narrative which discusses the need for both types of mark-up, plus citation references, and general reference notes.
Importance:	Very Desirable
Why?:	Semantic mark-up provides machine-readable references that can be used for automatic linking, and generation of hyperlinks for the UI
Source:	Structured Narrative
Way forward?:	Need to widen the scope of "Notes" to include general narrative, and hence "Structured Narrative". This is a neglected area that STEMMA is pushing alone. It is essential for the representation of family history data as opposed to mere genealogical data
Dependencies:
Approval status:
Proposer:	23 May 2012 ACProctor
Changes:
Discussion:

Timelines

Id:	Timeline01
Title:	Timelines
Description:	This is just a placeholder so far.
Importance:
Why?:
Source:
Way forward?:
Dependencies:
Approval status:
Proposer:	20 March 2011 gthorud
Changes:
Discussion:

Snapshot of the page:
Better GEDCOM Requirements Catalog snapshot 2may2011.pdf

Comments

brianjd 2011-02-21T09:02:31-08:00

Data-Date02 -modified

I made the Data-Date02 Mandatory from desirable. The reasoning is that any standard that we develop needs to accommodate every common calendar. At the least, we need to support Julian, French, Gregorian, and any common format used on records from 1400 on. And we should explicitly allow all. It is the task of coders to support what they will, but we MUST allow all, or at least use the 80/20 rule.

gthorud 2011-02-22T10:40:33-08:00

Adrian

Well, if I deleted the text, it was not intended.

In 90% of the cases, all dates in a file will be using the same calendar, so if that calendar is identified in the file, I don’t understand the need to see the date encoded in text in order to identify the calendar. But I have proposed to have a sort date accompanied by a string, so what you want is possible even with a numerically encoded date.

The issue with errors is something that is handled by applications.

“Data Model could include a facility for a default calendar, etc, or this could be left to data entry or conversion in applications” I have no idea about what the last halve of this statement means.

The order of day, month and year is not dependent on the calendar, there is a huge number of ways to write a date in the Gregorian calendar.

GeneJ 2011-02-22T11:00:15-08:00

Myrt has given some great lectures on dating issues.

I have only a handful of date entries (mostly my Norwegian baptisms) that are not either modern or limited location O/S dates.

To complicate the issue of my experience, I use a program that has fields for both "date" and "sort date."

I am not familiar with all the genealogical programs on the market, but do recall GENBOX a calendar like step when I entered an O/S date to that program.

Would sure like to see comment on this discussion from wide range of users. --GJ

AdrianB38 2011-02-22T13:21:00-08:00

Geir - a "sort date accompanied by a string, so what you want is possible even with a numerically encoded date". That sounds a useful way forward.

"order of day, month and year is not dependent on the calendar, there is a huge number of ways to write a date in the Gregorian calendar" - good point. The description does include "This [calendar] definition should include the ordering of the date items within the date" but perhaps I will tweak it to say "This [calendar] definition should be accompanied by a definition of the ordering of the date items within the date" to make it clear that these are 2 separate concepts.

"Data Model could include a facility for a default calendar, etc, or this could be left to data entry or conversion in applications" Yes.... It's not wholly clear on reflection.

Let me see if I can get it better - I'm trying to allay fears that people may have about suddenly going into their family history files and needing to add a calendar "code" to all their dates. This can be done by explicitly adding a default into the BetterGEDCOM file itself or by getting the application to update every date automatically. This latter would be done when converting from GEDCOM-compatible to BG-compatible data. I'm probably getting too deep into solutions here so I think I'll put something like
"To be decided: whether Data Model includes facility for a default calendar, etc, or whether every date must be marked up by calendar code, etc. Intelligent application design should reduce the workload for the user in either case."

PS - Geir - a minute or two ago, you were editing at the same time as me. Wikispaces told me what it thought your changes were and I think it has managed to include both our edits.

When I went back in to change a spelling mistake it offered me an unsaved draft to work from. I've no idea what was in that draft but wonder if it might have contained only my changes - or only yours - so such oddities might explain why Wikispaces' idea of your amendments might not match your memory. At any rate, I prefer to work on this before the US wakes up to reduce the risk of double changes.

AdrianB38 2011-02-22T13:29:51-08:00

Gene - we may need to brush up on Myrt's lectures! So far as I know, all Europe is now on the Gregorian calendar (willing to be proven wrong there...) So it seems logical to have one calendar code for that.

FYI - Before 1752, the calendar for England used the Julian calendar AND the year started on Lady Day, 25 March. I have an idea that Scotland used the same Julian calendar but started the year on 1 January. Indeed, there are English parish registers that sometimes started the year on 1 January! So how many codes do we need? Gulp!

GeneJ 2011-02-22T13:39:23-08:00

And .. if we default to the Gregorian, then would you only need a code if you are entering the old style date (vs the indirect Gregorian equivalent).

Isn't that rather like existing GEDCOM, only expanded? (For some reason, I thought existing GEDCOM recognized the OS command following an old style date entry.)

P.S. Myrt's lecture is just packed with info. As to just that "1752 change" though, see Wiki for Gregorian Calendar. In particular, "Pope Gregory XIII, after whom the calendar was named, by a decree signed on 24 February 1582, a papal bull known by its opening words Inter gravissimas.[4] The reformed calendar was adopted later that year by a handful of countries, with other countries adopting it over the following centuries." [With emphasis on the word CENTURIES.]

brianjd 2011-02-22T21:23:19-08:00

On the default date and date format subject.

There is no need for worry on this. Everyone has already chosen or let alone the pre-chosen date format.

Possibly the application the user is using has it set. If not I guarantee the Operating System does.
The obvious solution is to simply presume the OS default unless there is a explicit override. Naturally, this will cause no end of problems, because there will be plenty of cases where the dates are in a format other than the expected. But there is no solution we could come up with that won't cause issues. we just have to accept that.

AdrianB38 2011-02-23T12:23:05-08:00

Brian, certainly no problem with there being a default on the PC, though while the O/S may well have set a calendar and date format, we need it out of there and into the file when we transmit it. No particular problem with that, of course.

And it might be the wrong format, of course. Ubuntu may well be wonderful but I somehow doubt it carries a calendar to give dates like 1 Jan 1666 OS / 1667 NS <grin>

(Gene - I enter these OS/NS dates into Family Historian quite happily so GEDCOM may well take them by default)

gthorud 2011-02-23T16:28:10-08:00

Brian,

There are problems with editing the page, the only solution I see is to save frequently - and cancel your edits since last save if you get a message that someone else is editing. Saved drafts - I am not sure - but tend to avoid them.

Isn't it possible to add a Country code in addition to the calendar value, or are there several dates for transition to Gregorian within a country - well Scotland - but maybe there is some ISO standard with a code for Scotland.

ACProctor 2011-11-30T08:18:40-08:00

I agree that Date values must have an associated Calendar, although there should be a default calendar for, say, Gregorian - or whatever your most common Date value is relative to.

I have worked in the area of "globalisation" for many years and I feel strongly that BG should not get hung up on date-ordering issues, or use any day-names or month-names in date values - both of which would create a locale dependency. ISO 8601 was created as a Date standard for exchange and storage of date values, and the numeric format in particular (yyyy-mm-dd) is ideally suited to locale-neutral data values such as in XML.

All issues of date-entry and date-formatting are for the software loading or generating a BG file. The file itself must remain independent of those processes. Date-parsing should not be a consideration for BG as a locale-neutral and culture-neutral textual data format.

WesleyJohnston 2011-11-30T23:39:23-08:00

I am seeing again an issue that I have seen in a number of discussions. There are two very important perpectives that I believe have to be kept in awareness for BetterGEDCOM.

The first perspective is that of the researcher using BetterGEDCOM to create from sources a database that reflects the evidence and conclusions the researcher has used. This perspective includes both the storage of that database and the sharing of it, either with others or with merging with other databases of one's own. The vast majority of discussion posts come from this perspective.

The second perspective is of a repository that seeks to make available their records in BetterGEDCOM format. This is the perspective that I believe is being left out of most of the discussions.

And in this discussion of Date02, the second perspective is very important to retain in our awareness as we develop a standard. I think it is retained in the "Why?" statement of the requirement: "Dates may occur in source documents in all sorts of calendar representations. It is desirable that the codified representation of that should differ as little as possible from the written characters in the source, to reduce the scope for error in input or output."

If I am a curator of a repository, seeking to offer my repository information online in BetterGEDCOM format, then I do not want to be forced to convert dates in the source material into standard modern Gregorian dates (or any other dates either). If a record says that the event happened on some ecclesiastically named day of a year which is also likely to be an ecclesiastical year (e.g. not beginning on January 1), then I want to record the date as it is written.

I see an issue here very similar to place hierarchies: there is a need for standardized accepted databases of place hierarchies over time, and there is a need for standardized accepted conversion software to convert dates from all the different standards used into modern standard Gregorian dates, so that comparison of dates can be done on an apples to apples basis.

In both cases, this means some standards organization being responsible for the creation of those standards.

BetterGEDCOM must, in my view, support robust apples to apples comparison (which inherently includes ordering) of dates. This does mean parsing of dates and not merely carrying them as text strings. Certainly the early versions of BetterGEDCOM can cut corners on this, for both dates and places. But we should not lose the long-term vision of having these standard place and date aspects robustly supported, which does mean that ultimately someone has to do a whole lot of work to create and maintain those standards for places and dates.

ACProctor 2011-12-01T09:42:48-08:00

I'm not sure I catch your drift Wesley.

Thinking just of Gregorian dates for a second, there might be 3 versions of a date in a typical case: the image or original document in which there is a written date, the transcribed date string, and the date value which is the interpreted version of that date. It is the latter of these - the machine-readable, searchable, sortable date value - that I am talking of. I appreciate that the other versions will need support but the date value must be unambiguous (assuming you can decipher the written version) and ISO 8601 is designed specifically for that purpose.

As you rightly point out, this is similar to place hierarchies in that you may have the original, a transcribed version (incl. any spelling errors and informality) , and the normalised machine-readable, searchable, sortable version.

Any scheme that has to depend on some magic date parser is doomed to failure because no such beast exists. You might be able to get one that works for the US, or for the UK, but a globalised bullet-proof on e is just not possible and I wouldn't trust any software that claim it can do it because I know there is no well-defined grammar and in the worst case the date is ambiguous.

WesleyJohnston 2011-12-01T10:34:15-08:00

re ACProctor

The different dates must be retained, and I see that your post was in regard to the unambiguous interpreted date. And it is good that you have raised the ISO 8601 standard.

I do think we need to have both a long-term view of bringing about just such a "magic" date parser, which is never going to be bullet-proof but probably could handle the vast majority of what it sees.

The fact that the reality of today is not close to that should not take it off the table for the long-term as something BetterGEDCOM as an organization should foster, while at the same time acknowledging that we have to build BetterGEDCOM version 1 within the constraints of where the technology is today.

AdrianB38 2011-02-21T12:29:12-08:00

I think it's reasonable that BG should mandate saying what the calendar is. (I've altered "should" to "must" to match).

AdrianB38 2011-02-21T12:39:38-08:00

I'm not sure I quite approve of the removal of the comments in the "Way Forward" section.

The comment about "It would probably seem sensible to define a default calendar and date ordering for each file" was intended to stop people moaning that they didn't want to mark every date in their file with a calendar. Yes it's a solution-comment not a requirement-comment but I felt it useful.

Also removed is the sentence "An alternative method would be to encode each date into the same representation - e.g. number of days since some agreed event." I put that in because it was mentioned as one possibility for "normalising" dates to allow translation. It's not a solution I liked because it's hard to deal with dates like "About January 1866" (as distinct from "About 1 January 1866") so I did reject it but felt it useful to record the rejection.

Comments anyone on whether the so-called clarifications add anything?

AdrianB38 2011-02-21T14:28:16-08:00

Geir - if I understand the outputs from this Wiki, you've altered "A BetterGEDCOM file must define the calendar" to "A BetterGEDCOM file should define the calendar", while leaving the importance as mandatory.

Perhaps I should have put into the template:
"must" goes with "mandatory" and
"should" goes with "very desirable".
The meanings of the words and phrases are intended to match, because "should" has an element of "should but it might not".

Do you agree with "mandatory" as the importance of the calendar?

gthorud 2011-02-21T16:14:02-08:00

Hm, my memory may not be the best, but I can't remember to have changed it.

But since you mention it, what is the intended use of Importance?

Also, I think that Way forward could include technical solution things - one may need to discuss that. The important thing is to describe what we want so people can understand it.

I understand the difference between should and shall. ISO editor guidelines has taken should out of the vocabulary. But it is difficult to focus on this all the time.

I agree with mandatory, and that the header should give the default.

I am not sure about the implications of "It is desirable that the codified representation of that should differ as little as possible from the written characters in the source, to reduce the scope for error in input or output." - does this mean that we should have text representation and not numeric - maybe not.

You are right about the Calender being dependent on country.

I think we need to discuss before we edit other people's text.

brianjd 2011-02-21T21:14:45-08:00

Adrian,

Sorry about removing that sentence, I was just trying to keep it in sync with your other comments. But as you say it's a solution comment. I thought it was there because you wanted to explain the Desirable status. Since it's naturally a software implementation issue which won't be dictated by by us. One thing I would like to suggest is that a calendar can also be "text only". That being the user entered some funky date that the program can't decipher. Setting Gregorian as the default calendar is probably going to correct 66% of the time.

Plus I thought this page was for the group. I also made a few minor edits fixing grammar in places. I am probably remember wrong, but I thought I saw a posting asking for help with this page. I keep pretty busy and get a ton of communications, so I sometimes mix up messages.

AdrianB38 2011-02-22T07:33:25-08:00

Geir - the history seemed to say to me that it was your id that made the change but I have my doubts that this Wiki always tells the truth if 2 people are updating at the same time. I note that you "agree with mandatory" so I will set it to "must" in the description.

Importance is there to help prioritise items.

Re "It is desirable that the codified representation of that should differ as little as possible from the written" - my thought there is that if the user enters "3 April 2001" (say) and then looks at the GEDCOM, it seems more useful if they recognise the value as something like they entered. It's just something to think about when evaluating how the date should be stored on the file. Personally, I don't like the idea of the numeric date partly because it gives no clue about the calendar in use - e.g. is 2 = February or Brumaire? Though if we explicitly mark each date with a calendar code, maybe it doesn't matter.

AdrianB38 2011-02-22T07:47:55-08:00

Brian - please don't feel you shouldn't update the page. In other places we have gone through loads of discussions but no-one ever updated the main page - hence when I was trawling for the first cut of requirements I nearly lost the will to live after a while!

I've put back a bit about possibly having a default calendar (modified to take account of your comment about it possibly being an application issue)

Plus I've highlighted the assumption that we are going to work with calendar dates and not days elapsed since something.

GeneJ 2011-02-21T13:12:44-08:00

Data05 - Universal Qualifier Symbol ("?")

Id: Data05
Title: Universal Qualifier Symbol ("?")

Description: BetterGEDCOM should incorporate methods allowing users to apply the universal qualifier "?" before dates (or parts of dates), locations, names, etc.

Importance: Very Desirable

Why?: Supports faithful recording of research status and results.

Source: Hoff and Leclerc, _Genealogical Writing in the 21st Century_ (2006), p. 115, "Commonly used Symbols," for "?" as, "uncertain interpretation of original text."

Way forward?:

Dependencies:

Approval status:

Proposer: GeneJ

See also the first part of the discussion at:
http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138

GeneJ 2011-07-08T18:07:03-07:00

(1) References from the original discussion.

Hoff and Leclerc, _Genealogical Writing in the 21st Century_ (2006), p. 2, "The process of expressing our findings in writing--including proper use of terms such as probably, possibly, likely, and maybe--is the most valuable tool in our research kits. Unfortunately, it is also the most neglected."

Same source, p. 115, "Commonly used Symbols," for "?" as, "uncertain interpretation of original text."

The discussion morphed into other topics, but the link is below.

ref: http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138

(2) I have seen the qualifier before information and after, but context is important here.

(a) Before ...
(i) _New England Historical and Genealogical Register_ 165 (Jan 2011): 41, for child list entry about "vii. ?DAUGHTER, bp. Copford 8 Feb. 169/20"; same treatment, also in the child list at p. 48 for a ?SON and a ?DAUGHTER.
(ii) Also before in NEHGR July 2010, p. 187 for the child list entries about ?Ebenezer Handy and ?Hannah Handy. ... "ii. ?HANNAH HANDY. As mentioned above, it is not clear whether Hannah Handy, one of the grantors in the deed of 1 July 1802 given above, was the sister of Silvanus Handy or his mother (or stepmother)."
(iii) Will have to track the third down, was the question symbol in front of the child's listed number in the child list.

(b) ...and also following information, but when it follows, appears in editorial brackets (the brackets also protect the symbol from being confused with punctuation).
(i) _New England Historic and Genealogical Register_ 165 (Jan 2011): 30, for "...and another daughter Phebe was born to them on 4 February 1732[3?] ..."
(ii) Quite frequently in NEHGR to indicate that whether a date should have been a double-date is unknown. For example, NEHGR 163 (April 2009):107, twice, "1680[/1?]" and "1740[/1?]" ... NEHGR 164 (October 2010): 283, 290 for "9 March 1677 [1676/7?]," and "15 Feb. 1742[/3?]"
(iii) For part of a date ...NEHGR 163 (April 2009):139, "400 acres in Killingly on 5[?] January 1718/9" ...

Note: I think an unbracketed question mark symbol that follows an entry would be too easy to confuse with punctuation.

(3) NGS has transcription standards. I haven't looked at those in a little while. Will have to first check BCG Genealogical Standards Manual.

(4) Again not directly related to this requirement ... NGS Quarterly uses dashes to signify a missing name, and NEHGR uses underscores. If I'm not mistaken, four of each (dashes or underscores). NEGHR uses the same underscores for "missing information" [Hoff and Leclerc (2006), p. 116 (Appendix B)]

(5) Similar but different, see requirement "ConfAcc02 (was Data04)":

April 2011 NEHGR, page 125 has entry in the child list for "ii. _probably_ JOHN PYLE, infant, bur ..." and the same term, same treatment for four children in the child list on page 130; again on p. 133.

In the earlier October 2010 issue (NEHGR), p. 264 has entry "(perhaps)" in the child list, as "iv. (perhaps) DAVID WATERBURY ..." same style but "(probably)" on the next page for an "unidentified daughter."

AdrianB38 2011-07-09T10:05:42-07:00

Oh Gene - is that sound I hear, the sound of a can of worms being opened?

I'm not sure how many different combinations of characters and words we seem to be creating that describe uncertainty. And if the learned societies can't agree or be consistent (or am I being pessimistic?) I'm not sure what we do...

But I'm not sure how much we need to do.

If someone enters ?Ebenezer Handy into their genealogy pgm, then I _assume_ it enters the database as
first name ?Ebenezer
surname Handy
This will then get processed as if ?Ebenezer were a normal name - BUT it will not get sorted next to Ebenezer. If you want that to happen, then you either swap to use Ebenezer? Handy or you store it as
uncertainty-prefix ?
first name Ebenezer
surname Handy

The issue of dubious dates is more complex since a lot of software expects dates to look like, err, dates. Certainly in the s/ware I use, if I write "15 Feb. 1742[/3?]", then it won't get processed as a date but as a piece of text with no date meaning. If you want it to appear as a date according to some arbitrary decision, then there are 2 options:
- enter it as string "15 Feb. 1742[/3?]" but tell the database to store an interpreted date of 15 Feb. 1742. Or 15 Feb. 1743 if you wanted.
- create some new uncertainty rating of "Unsure if OS or NS date"

Personally, I'd go for the minimum change and use the facility found in some s/ware of a text string (i.e. "15 Feb. 1742[/3?]") having a date interpretation added to it.

A linked issue is that of transcription markings. I always use square brackets [] round unclear text and angle brackets <> round inserted text. But I've no recollection where I got that practice from. Worse, we have people using [] for both purposes, so I've no real idea whether when they write "[sig] John Doe", do they mean John Doe signed this? Or do they mean the three letters "sig" probably appear in front of the words "John Doe"?

HOWEVER - I think we just ignore this debate in BG and pass forward whatever we get. I don't _think_ we need worry about these transcription standards. Do we?

GeneJ 2011-07-10T18:17:29-07:00

Hi there Adrian:

(1) My personal preference is to allow either the symbol (?) preceding the entry or the symbol in editorial brackets that sets the entry off with specificity [/12?]

(2) As in the original thread ... I was hoping genealogical technology standards could bring us the form of a preceding symbol that would be neutral for sorting purposes.

Generally, the requirement here (universal qualifier) if part of a the broader discussion in the original thread, all of which recognizes the need to move away from absolutes and into the real world of genealogy.

Original thread suggests we all want this, so perhaps we keep collecting ideas and input for now. --GJ

redmanvan 2012-11-05T12:54:10-08:00

I wonder if this is more about the software than about the data. A program that displays the information to the end user can put a ? at the start or the end of a piece of data. It could also display the information in a different font, a different colour or whatever.
Likewise for data entry - a program could use whatever interface it wished to permit a user to specify that information is uncertain.
But the way that information is stored in a file could be quite different. While a ? might suit the end user in many cases, that isn't really a matter for BG to specify.

GeneJ 2012-11-05T13:28:45-08:00

Hi Redmanvan,

The question might be, do genealogists need a standard way of communicating "I'm not quite sure, but I think ..."

If you can't make out a record, you might enter an underscore, "_", but what if you think you can read the name.

This morning I quoted from a text in which the brides name had been published in 1894 as "Mary Witherton [?]"; so the practice of not being able to decipher a name but not being quite certain.

What do you think?

GeneJ 2012-11-05T13:30:15-08:00

*"so the practice of being not quite certain, but practically certain, is not so new.

redmanvan 2012-11-06T08:31:00-08:00

I agree that a standard way of communicating uncertainty is useful - whether a simple question mark is sufficient for every case is arguable.

But my point is, the meaning ("This name is uncertain") can be expressed in so many ways, that we should not force BG to use a ? mark in its internal representation of that meaning.

In fact, in the example you quote, it may convey uncertainty, but it certainly doesn't convey indecipherability.

If BG is to convey uncertainty, it should also convey the reason for that uncertainty, and if there is more than one possible reason (indecipherabilty, inconsistency between records, unlikely spelling, etc.) than a ? is not enough.

Alyn

GeneJ 2012-11-06T09:39:53-08:00

HI Alyn,

Aside from the need to provide the rationale, is there a reason you would not want to provide some symbolism? (Whether it was a question mark or a smily face.)

I agree that there are various ways to convey uncertainty and that indecipherability is only one reason for uncertainty.

"If BG is to convey ... it should also convey the reason ..."*

I didn't mean to suggest that the whole would not explain the reasoning, but there should be a way to convey what has been separately explained.

In my case, that explanation would usually be found in a citation, but I still have the other databits to be concerned about. Another example follows.

Today, I confidently write the name of one of my ancestors as Elizabeth (Clark) Preston. For many years, though, the best evidence I had about her maiden name was a historical marriage record containing modifications. Her married surname had been clearly entered to the record. By some different pen and/or hand, that surname had been crossed out and the surname "Clark" had been written below. (It's possible that the changes had been entered and then erased.) The record I'm referring to can be viewed here:
https://familysearch.org/pal:/MM9.3.1/TH-266-11124-173590-21?cc=1520640&wc=7131654

I can easily record a citation to that record in which I can explain the issues; have even spoken to the town clerk who holds the record and have learn more about it.

During the time when that marriage record was the best evidence of her name, I might have entered Clark[?], associated with the citation I described. For my purpose, that entry would have been preferable to leaving the surname blank.

How the uncertain information is handled involves judgement. In other words, what I might feel is worthy of entry, albeit uncertain and cited, another might see as unworthy of entry at all, only commented upon in the citation. This requirement wasn't intended to set a recording threshold, just to provide a way forward when the genealogist feels it has been met.

(I suspect many genealogists already use the symbol ?, so as Adrian suggests above, we pass it on. I'm suggesting there is probably a benefit to recognizing it and encouraging its use when folks hit that threshold. As long as BG is not being stripped off or relegating the associated data to as "irregular," there are probably other ways of promoting the use of the symbolism.)

*Recognizing that BG can only convey that which it has been "fed."

ACProctor 2012-11-12T10:13:40-08:00

This is an area where formalisation may need to be justified. I know a lot of transcription groups have a syntax for representing unknown characters or characters that may be one of a set. www.freebmd.org.uk is probably the one I am most familiar with. Those syntaxes - which are generally some form of "regular expression" - are easy for software to read, and to work from, but many users would have to look in a book to see what it means.

My question is therefore whether this formalisation is for the benefit of software or end-users.

I thought about this when designing STEMMA and decided to defer any decision. At the moment, I am recording the transcription as best I can (i.e. most obvious and/or most likely), and using STEMMA's narrative feature to record why or how it may be uncertain. That actually works well from an end-user point of view but any name matching algorithm would have nothing to work from

louiskessler 2012-11-12T17:38:53-08:00

Tony,

It's probably useful in the "data" field of source references. But I don't see it needed anywhere else.

Louis

GeneJ 2012-11-12T17:42:42-08:00

Hi Louis,

The purpose of this request was to have better support in the pfact data fields. Those fields are used for other than just tree matching. In the programs I use, they are part of the sentence and narrative structure.

ACProctor 2012-11-13T06:26:06-08:00

Narrative should be searchable too. This is why I have tried to push STEMMA's concept of "structured narrative". In principle, its mark-up language could record the "regular expression" syntax in such a way that it does not detract from the visible text being read by the end user (...in the same way that a URL link shows you the link title without the link syntax).

testuser42 2011-02-23T06:18:08-08:00

Agree.
I've never read any Genealogy standards book (They seem to be much more common and important in the US), but I kind of came up with the exact same use for the "?". Though I also used it for showing "uncertainty" anywhere my software doesn't allow a "surety", e.g. with dates. Which in turn makes my software complain about invalid dates...

GeneJ 2011-02-23T07:19:09-08:00

I should finish my morning wake up before trying to respond.

The trick is we want software to be able to recognize that symbol but also include it and ignore it for the purpose of generating lists or setting things in date order. So, 11 ?Jan 1837 would sort just after 10 Jan 1837, and before 12 Jan 1827; John ?Williams would sort before Williams, Johnny, but after Williams, Jan.

testuser42 2011-02-23T08:16:00-08:00

Yes, exactly. My current software is to stupid for that ;)

AdrianB38 2011-02-23T12:14:18-08:00

"11 ?Jan 1837 would sort just after 10 Jan 1837" - that's probably not a problem. At least two ways of doing it that I can see:
- store both "11 ?Jan 1837" and "11 Jan 1837", the first as the real date, the second as the date to be used for sorting, arithmetic, etc
- store "11 ?Jan 1837" in bits (i.e. "11", "?", "Jan", "1837") and recreate the date for sorting and arithmetic from the "11", "Jan" and "1837" bits.

GeneJ 2011-02-23T12:38:25-08:00

That is great news! tyty

theKiwi 2011-02-23T14:58:24-08:00

Adrian wrote:

"11 ?Jan 1837 would sort just after 10 Jan 1837"

to me this is saying

I know it was the 11th
I know it was 1837
It was possibly (or I think it was) January

so I'm not sure why it would be made to sort after 10 Jan 1837.

If there is to be a qualifier like this, it should be attached to the element it's qualifying, which in Adrian's example would be the 11 not the Jan part of it, so

?11 Jan 1837

GeneJ 2011-02-23T15:33:49-08:00

@Kiwi,

See the first presentation of the example (third entry in the discussion), where "Jan" is the element questioned.

gthorud 2011-02-23T17:15:26-08:00

I am not sure that I see any advantage in encoding this as a separate bit along with bits for day, month etc. I think I would like to see the ? in a string encoded date, most likely accompanied by an optional sort date.

But, I would suggest that a "surety" value, a separate bit, meaning ?, could be attached to the whole date, possibly to one of the two dates defining an interval or a single complete date.

I have already suggested to have a surety value attached to the link between levels in a place/location hierarchy, and there should be surety attached to the link from an event to a place name (thus possibly putting a question mark against the whole path from the bottom level place name to the one on the top level).

And it should be possible to attach ?s against all sorts of relations - between persons, names, even media. I would love to see a ? against a line between persons in a chart.

In the same way as ?, I would in some cases want to see a character indicating dissproval.

I should also mention that in national standards here for transcription of church records and census records, double question marks, ??, are used to indicate that the source is difficult to read so instead of January it could say Jan??y. ?? are used because single a ? may appear in the source. I have seen many dates in church records containing a question mark. (If there are two possible interpretations of a word we write both with @ between, !! marks missing data or an obvious error in the data, and %xx% means xx has been crossed out.)

Christine_E 2011-07-08T11:35:12-07:00

Why not put the "?" after the part that you can't read, then the sort will come out close to where you want it without any extra work?

This not only applies to dates which have a small range of possible values, but names and locations that are hard to decipher or were spelled phonetically.

gthorud makes a good point here: "I should also mention that in national standards here for transcription of church records and census records, double question marks, ??, are used to indicate that the source is difficult to read so instead of January it could say Jan??y. ?? are used because single a ? may appear in the source. I have seen many dates in church records containing a question mark. (If there are two possible interpretations of a word we write both with @ between, !! marks missing data or an obvious error in the data, and %xx% means xx has been crossed out.)" I think the "Way Forward?:" involves research to see what else is commonly done when text is unreadable.

AdrianB38 2011-02-25T14:37:12-08:00

Syntax09 Define Event vs. Attribute

Initial creation:
Assuming that the BetterGEDCOM project distinguishes events from properties / facts / attributes / characteristics, then BetterGEDCOM must define and publish a clear definition of the difference between the two concepts that does not rely on a list of each. In particular, the definition must be clear enough for competent software suppliers and users to understand whether a new item is an event or a property / fact / attribute / characteristic.

There is no clear definition in the GEDCOM 5.5 specification of the difference between the two, only a list of events and a list of attributes. This means that a software supplier or user does not always know whether to create an event or attribute. As a result, the same concept can appear as both, resulting in difficulty of exchange of information.

ttwetmore 2011-02-26T17:27:53-08:00

Gier says, "I don’t understand why there must be Level One Vital Events, why can’t they all be Level Zero True Events?"

They could be all level zero. But think of it this way. You get birth data from many sources. Think about getting an age on a census record. This is a great example. You subtract the age from the date of the census and you get an estimated birth year for the person. The census might also list the birth place of the person. So from the census record you have an estimated birth year and a possible birth place. BUT, BUT, BUT, you never really got an actual birth event for the the person did you? You didn't find real hard evidence. It's all secondary information extracted from the census event.

So, one of my principles about genealogy is that you create event records from the evidence you find about events, and you create person records for all persons mentioned in the events, and you add to those person records everything you learn about the persons from the event evidence. In my mind it is much better to think about that inferred birth information as something you learned about the person from the event, so it is something that should be kept inside that person. That's my principle again. It's a compromise of course, as all things are.

See, I don't mind having two ways to do something, if it makes sense to me to have those two ways. In the case of the vital event and the multi-role event I definitely see a difference, and definitely feel it's okay to treat them differently. But every vital event could of course be transferred into a level zero event record if there were a rule that said that was the only way events could exist in the file format.

So for me a vital event, a level one PFACT-like thing inside a person, is just the right thing for these secondary, inferred, not quite events, that we often learn about people offhand through records that were really created for an entirely different reason.

Hope this isn't too confusing.

And of course, there is a very practical answer to the question as well. When converting GEDCOM data to Better GEDCOM format, if there were no level 1 vital events, we would probably triple the size of the resulting file in terms of records and maybe size!

Tom W.

AdrianB38 2011-02-27T04:53:11-08:00

Geir - Re "the structure that some people currently use to transfer hair color, caste, eye colour, nationality etc. although some of this info may change over time – something the Trabant can’t transport"
In GEDCOM 5.5 (at least) INDIVIDUAL_ATTRIBUTE_STRUCTURE (which includes physical descriptions, so I hope that's what we're talking about) includes an EVENT_DETAIL and that in turn includes a DATE_VALUE - so the GEDCOM attribute (the Trabant (grin)) should be able to transport dates from 1 person to another.

A person's name, however, does not have a date in GEDCOM 5.5, giving the impression (to me at least) that all names on file for this person, can apply at once. Name (and sex) are not, in GEDCOM 5.5, attributes in the formal GEDCOM sense of the term.

It is pretty clear to me that Name (and Sex) should be included in the attributes, or whatever we end up calling them, in BG, and that all "attributes" in BG should have dates (or date ranges) in BG (a) because they need them and (b) because most of them could have them now.

Now whether people want to use the dates, whether (if they do) they want to stick a NAME-CHANGE event (say) chronologically between 2 NAME attribute values, is entirely up to them. We just have to provide the facility for those who want to do so.

ttwetmore 2011-02-27T06:08:59-08:00

Adrian,

Your latest about regularizing name and sex and every other PFACT type of thing seems to me to be the approach Better GEDCOM should take towards a generalized attribute concept.

I think the vital event, if we decide to keep it, fits in this category also.

Okay then, there is now a NEXT CONCEPT that needs discussion in the attribute vs event context.

Gedcom has the ASSO tag and DeadEnds has the relation structure and I'm sure other formats have other constructs for the same concept. I have even seen the tags FATH and MOTH in some GEDCOM files.

What I am talking about is an "attribute" that is basically a pointer to another record with a label on it to define the type of the relationship. Note that in the flurry to get rid of the GEDCOM FAM record, such relationship pointers, or a concept that allows the recording of the same information, becomes paramount -- without them genealogical databases would hold no relationships.

It boils down to this question -- IN BETTER GEDCOM HOW DO WE WANT TO EXPRESS THE FACT THAT PERSON A IS THE FATHER OF PERSON B?? I'm assuming that the anti-FAM people have their way and we have no FAM record to help us out. Without a FAM record what will our world be like? Please realize something very important -- in today's world the FAM RECORD IN GEDCOM HOLDS 99.99999% of all relationship information between people. With no FAM entity ALL THIS RELATIONSHIP INFO HAS TO MOVE SOMEWHERE ELSE.

If you check out all the models and what people have written you'll find these three answers:

1. Relationship objects -- this is a common answer. Create a new record type (if you love relational databases you'd call it a new table type). A relationship has a type (parent/child, brother/sister, etc) and then pointers to the two persons in the relationship. Lots of similarities to the concept of the multi-role person, but not identical. As a RDBMS table it's three columns, a type and two foreign keys to a person table.

2. Assertion objects -- this is the scary answer -- a la GenTech, where EVERY relationship between EVERY RECORD TYPE must be mediated by a different assertion object. Assertion objects were also created because they have a simple direct implementation as a RDBMS table. In reality an assertion is nothing more than a generalized relationship -- this is required in the GenTech model because EVERY RELATIONSHIP BETWEEN EVERY KIND OF OBJECT MUST BE IMPLEMENTED AS AN ASSERTION (EVEN RELATIONSHIPS BETWEEN ASSERTIONS!!) -- to go off onto the anti-GenTech tirade for a moment, I hope you realize that in the GenTech model, EVERY PHACT (BE SURE YOU UNDERSTAND WHAT THIS MEANS -- EVERY, EVERY PFACT) IS ALSO ITS OWN RECORD AND FOR YOU TO ADD A PFACT TO A PERSON YOU HAVE TO CREATE THAT PFACT AS A SEPARATE RECORD AND THEN CREATE A NEW ASSERTION RECORD TO BIND THAT PFACT TO ITS PERSON -- "Help, help, I've fallen and I can't get up!"

3. Direct references -- person B points to person A with a pointer that implies "You're my daddy", and maybe vice versa.

We're going to have to pick one of these for Better GEDCOM. OpenGen has been sniffing around the relationship approach. My bet is that SoRD is sniffing around the assertion approach. In DeadEnds I've opted for the direct reference approach. My reasoning has been that the relationship and assertion approaches both require the addition of (at least one) new entity type to the model as well as a large increase in the number of records in actual external files. In the direct reference approach no new entity type is required as each person in a relationship simply points to the other (it could even be others).

So what's the relationship between this discussion and the discussion of what is an attribute? It' pretty simple really. These direct references that could be used to establish relationships between persons also LOOK, ACT, SMELL, and so on, like all other things in our "extended" view of what an attribute is.

Okay, where can this discussion go after we resolve this one?

Well, we could discuss whether a source is an attribute of a person record.

And my favorite, how should we treat the evidence persons that provide the grist for the conclusion mill, as components of a conclusion person? That is, ARE THE EVIDENCE PERSONS THAT PROVIDE THE DATA BEHIND A CONCLUSION PERSON ATTRIBUTES OF THAT CONCLUSION PERSON? If you consult the DeadEnds model you will see that each person record can contain an unbounded number of references to other person records. In the DeadEnds model this is my implementation of the evidence and conclusion objects. So it's very easy to simply to view these evidence person references as "just another" type of attribute.

I take a basic, data structure view on the whole thing. A data structure is a tuple of information. Each element of the tuple can be simple or a self-contained data structure of its own, or it can be a pointer outside of the current data structure to another data structure. All our entity types in the Better GEDCOM model and in every other model can be thought of as one of these data structures. It is this view that is the lowest common denominator for all other views.

Tom

louiskessler 2011-02-27T09:20:34-08:00

Excellent analysis about relationships, Tom. That info deserves to be promoted to some place on the site it will not get lost.

One consideration for deciding what to do with this in BetterGEDCOM. We will want developers to adopt BetterGEDCOM. I expect all 500 plus programs out there now use the FAM relationship object. They will NOT change their internal data structures to accomodate BetterGEDCOM, so they will need an easy way to translate from whatever BG has to their internal structure. If that is going to be too much work on their part, they either won't do it, or they'll do it while raising a big stink about it and that won't be pretty because when asked, they'll tell people that BG sucks.

gthorud 2011-02-27T10:00:40-08:00

It appears that that it is difficult to agree what we are trying to find a term for. It appears that Tom is defining an Attribute as something much wider that my Trabant, and what I understand was meant to be defined in the beginning of this topic. Tom's definition is more like an attribute as used in a data model. Back to square one.

I will come back to other issues, but can someone explain why there can be an event subordinate to what I would call an attribute in INDIVIDUAL_ATTRIBUTE_STRUCTURE in Gedcom 5.5(.1)

gthorud 2011-02-27T10:04:22-08:00

Sorry about the last question, I should have read the previous entries.

gthorud 2011-02-27T11:53:51-08:00

Tom,

Regarding Level zero and Vital Level one events.

I understand what you are saying, but I still don’t see why there could not be only level zero events. And I don’t understand why that would triple the size of the file.

Adrian,

I have a problem understanding the purpose of the INDIVIDUAL_ATTRIBUTE_STRUCTURE containing an event. Cant find anything about why in the Gedcom spec, have I missed anything. WHY NOT JUST USE AN EVENT – is this a construct that that has appeared after someone has discovered that hear colour can change over time? Maybe someone can explain why there is an event inside this structure?

Also, I am skeptic to having date attached to a name. Isn’t a reference to an event where the name is used enough, without having the date within the same structure as the name? Where would the data come from if not from a source that could be identified an event?

ttwetmore 2011-02-27T12:25:42-08:00

Geir,

I didn't mean to throw a spanner into your works. There may be a few more vehicles out in the parking lot!!

I am really a very one-dimensional person. I see data models in very simple terms, and those terms are nearly 100% simple computer record structures with a little bit of object-oriented frosting thrown in.

I see every entity as a computer record structure, where a computer record structure is a tuple of name fields. There are only a few kinds of fields, and maybe these fields are the things that Geir is equating to automobiles. Here are some of the possibilities:

1. A field whose value is a small number of discrete values from a special set (sex has m and f; event type tags come from an agreed upon set, and so on; other examples left to the reader).

2. A field with an infinite number of values, but whose values still obey strict syntactic rules -- e.g., names, place, dates.

3. A field whose value is a generic string -- maybe the description of something, a note, a free-format description of a source.

4. A field that has its own internal record structure -- vital events are like this -- a BIRT in an INDI is a "sub-record" of the INDI record -- once you look at the sub-record in its own right, it's just another record structure inside the INDI record structure. Of course, this is the beauty of both the GEDCOM level structure and the XML element structure; you can carry the sub-record structure structure as deeply as you'd like to go. Note that JSON is the same, it's just that JSON, since it comes directly form the need to transport the values of Java objects, is really closely aligned to the idea a record structures (one would expect, once one realizes that the expressive power of GEDCOM, XML, and JSON are all the same, that it would be no surprise to realize that all WE EVER TALK ABOUT IN GENEALOGICAL MODELS is very simple data structuring stuff). It's just clearer in JSON that you really are transporting a computer data structure. And these new fangled things that people are calling "protocol buffers" these days, are really nothing more than these record structures in a binary form that makes them more efficient for moving around the internet through various service APIs, even though they suffer from the fact that they are not human readable.

5. A field that refers to another record -- source pointers are like this -- relationship pointers (if adopted) would be like this -- all this means is that the value of the field is a REFERENCE to ANOTHER RECORD STRUCTURE that is outside of this record structure -- note an interesting point that is rarely mentioned in the genealogical context -- pointers of this type are always assumed to point to the top level of another record structure, that is, to a full record object, what I call a first class citizen, but in fact there are times when you might want to think in terms of a pointer in one record as pointing SOMEWHERE INSIDE another record.

Someone else might break these things up using a different hierarchy, but there really isn't any other major way to see the data structuring world.

From my point of view every one of these five kind of things can be called an attribute because it provides some specific bit of information about the structure that contains it. Should we have five different vehicles for each of these?

In my view a PERSON and a multi-role EVENT are top level record structures, that is they are never found as sub-structures inside other structures. They stand alone; they are their own thing. You can think of them as attributes of the record that points to them even though they are stand along objects. This is what pointers are usually used for in computer programs anyway.

However, VITAL EVENTS, as I've defined them, and as they are defined within GEDCOM 5.5[.1], are always internal sub-record structures and become attributes that way.

Sorry, but my posts seem to slowly turning into "Computer Data Structuers, 101"

Tom W.

AdrianB38 2011-02-27T13:55:41-08:00

Geir - re "INDIVIDUAL_ATTRIBUTE_STRUCTURE containing an event". I think I understand your question but please bear with me if I got it wrong.

In summary - it isn't an event inside the INDIVIDUAL_ATTRIBUTE_STRUCTURE, it's just a load of data that looks exactly like the nearly-corresponding bit of the event does, so rather than define it twice, they used the same "label".

Perhaps if I write it out more completely (I'm quoting from GEDCOM 5.5.1 because the copy action keeps failing when I try to copy bits out of 5.5)

INDIVIDUAL_RECORD is defined in GEDCOM 5.5.1 (page 25 in my copy) as
n @XREF:INDI@ INDI {1:1}
...
+1 <<INDIVIDUAL_EVENT_STRUCTURE>> {0:M}
+1 <<INDIVIDUAL_ATTRIBUTE_STRUCTURE>> {0:M}
...
(where ... as usual means omitted stuff)
So - the Individual record contains 0, 1 or more event structures and 0, 1 or more event structures.

INDIVIDUAL_EVENT_DETAIL is defined as
n <<EVENT_DETAIL>> {1:1}
n AGE <AGE_AT_EVENT> {0:1}
i.e. it's made up of a single EVENT_DETAIL structure, followed by an optional AGE.

The EVENT_DETAIL is defined as (I'm not going to copy the formal GEDCOM out because it turns out I can't copy to the clipboard from 5.5.1 either!)
- an optional TYPE line
- an optional DATE line
- an optional PLACE structure (itself being multiple lines)
- an optional ADDRESS structure (itself being multiple lines)
...

INDIVIDUAL_ATTRIBUTE_STRUCTURE is defined as a whole list of options, virtually all of which follow the same pattern,viz:
n OCCU <OCCUPATION> {1:1}
+1 <<INDIVIDUAL_EVENT_DETAIL>> {1:1}

Yes - the INDIVIDUAL_ATTRIBUTE_STRUCTURE is defined as a line relevant to the attribute plus one INDIVIDUAL_EVENT_DETAIL. The latter is already defined above as
n <<EVENT_DETAIL>> {1:1}
n AGE <AGE_AT_EVENT> {0:1}
and
EVENT_DETAIL is defined as
- an optional TYPE line
- an optional DATE line
- an optional PLACE structure (itself being multiple lines)
- an optional ADDRESS structure (itself being multiple lines)
...

Or in other words, although the standard says INDIVIDUAL_EVENT_DETAIL, it's short-hand for the group of lines consisting of
- an optional TYPE line
- an optional DATE line
- an optional PLACE structure (itself being multiple lines)
- an optional ADDRESS structure (itself being multiple lines)
...
All of which are what you see against an attribute.

So it's not an event "contained in" the attribute, it's a set of lines having exactly the same format as a set of lines that happened to be defined for the event first.

Err - sorry if my short-hand mislead. Does this make it clearer?

ttwetmore 2011-02-28T05:36:41-08:00

Gier says, "Regarding Level zero and Vital Level one events...I understand what you are saying, but I still don’t see why there could not be only level zero events. And I don’t understand why that would triple the size of the file."

Yes, all level one, "vital events" substructures in person and other records can be converted to level zero "event records." They wouldn't triple the size of the file in pure character count, but would probably at least triple the size of the file in terms of number of records. The files would get larger in character counts simply because of all the extra record "boiler plate" and inter-record references that would have to be added.

The vital events as done in GEDCOM now (eg, BIRT, DEAT, MARR, ...) are much more attribute-like than they are event-like. Some might not agree with that. Each vital event describes one vital fact about a person or a family. There are no other role-players. There is obviously an event hidden away behind the vital, but the details of that event were not of concern when the fact was recorded (or if it were, the genealogical application proved inadequate for the task). A vital event is very much like an abstract of a real event, extracting only limited information (generally just date and place) pertinent to the primary role player.

I sense the real concern over dealing with vital events as substructures in other records versus multi-role events in their own right, is that it might seem like we would be sanctioning two ways of doing the same thing. You can see it that way, but I also think that's an incomplete view. Instead of just rejecting the idea of having two ways to do certain things, isn't it better to stop and rationally consider why there are good reasons for having two ways of doing things?

Think about the very practical problem of converting millions of GEDCOM records into Better GEDCOM records. Is there really a compelling reason to convert every BIRT, every DEAT, every RESI, every BURI, every CHR, every MARR substructure in the GEDCOM files into separate single-role event records? There is no technical reason why you can't do it, and if Better GEDCOM does away with the vital concept idea, we would have to do it, but what does it gain?

I've been a software developer for 45 years and a software professional for 40. If there were anybody around who could give advice for or against the idea of whether it's bad to have more than one way to do certain things, it would be me. I find nothing uncomfortable in the notion of having the two kinds of events. My hope is that having the multi-person event record will encourage genealogists to always collect all the info that is available from the evidence. But for the cases where a full-bodied event is not warranted, or there is no real evidence about the event yet, the vital event works well.

Here's another thought. You've just talked to your grandmother to get information about her grandparents. She gave you some birth dates of her grandparents that aren't yet in your database. Are you going to add those birth dates to the records for her grandparents as simply vital events, as would be done in GEDCOM, or are you going to create separate birth records for each of those grandparents. Don't you think it's kind of overkill to create birth events from such distant and secondary data? OK, you can do it.

Also remember that we are discussing getting rid of the family record. If that happens we have to find a new way to indicate relationships between people. Relationships between people are some of the same implied event information that vital events and regular events do. As soon as we face this we will likely encounter a third way that some information can be implied.

My bottom line is that I PREFER a world with both vital event structures and multi-role event records. We CAN'T live in a world with just vital event sub-structures. We COULD live in a world of just multi-role event records. From that it's up to the Better GEDCOM collective wisdom to choose the official path.

Tom W.

gthorud 2011-03-01T13:37:50-08:00

First, Thanks to Adrian !!!! for making me look much closer at the Gedcom definitions of INDIVIDUAL_ATTRIBUTE_STRUCTURE and INDIVIDUAL_EVENT_STRUCTURE.
It appears that my Trabant has no motor and must always be accompanied by a Rolls Royce. Based on how Attributes are presented in the user interface of some programs, I thought I knew what an attribute is, just a type value pair, I didn't check the Gedcom. So after checking the Gedcom for these two structures it appears, as has been stated earlier in the discussion, the I_ATTRIBUTE_S is just an I_EVENT_S plus a value (with at list one minor exception, BIRTH). This make a lot of change in my thinking, and I am sorry for the unnecessary discussion I have caused – and it must have been very difficult for those trying to understand what my Trabant was.

(When reading the BNF in the Gedcom standard I note that it might not be necessary to define event/attribute tags with BNF in BG, it should rather be done in a list.)

So, an Attribute is an Event plus a value, and in the BG context that can have multiple participants (people/groups) in an event, it becomes necessary to say that an attribute applies to only one person.
The other differences (time/date) seem to have been agreed to be no reason for a difference. I think a BG event structure should be allowed to contain at least one value (actually a type value pair). If so, the only difference between an event and an attribute will be that the attribute can apply to only one person. So, if we assume that we will have lists of event types with definitions etc, as part of the standard and as a central list that can be updated, it would be possible to define in that list if an event can apply to one or several persons/group. The list could also define roles and possible types for the value.

Summed up: This way we do not need to have separate structures for attribute, we only need event. I guess this is no surprise to the rest of you.

I then started to look at various programs and it seems to me that at least three of the major ones have no distinction between event and attribute in the user interface, so we might be in good company.

As has been pointed out above, Gedcom 5.5.1 proposes an extension called "Event description" (following the EVEN tag), which does not seem to be the same as an Attribute value (which some programs seems to think). Depending on interpretation of 5.5.1 Event description may or may not be used for a person. To me this description seems to be some sort of a summary of the event, it is an event value thing not a type thing, and it seems not to be an attribute in the same sense as eg. a caste name. Since it is following the EVEN tag, it seems to apply only to user defined events. How would this event description appear in a sentence in a report?

5.5.1 talks, in EVENT_OR_FACT_CLASSIFICATION, about subtypes of ANY event type, using TYPE. The question is if this is used by anyone? Is it needed? If so, the standard and the central list can define subtypes. Separate requirement?

Adrian, the name and sex issues should perhaps be discussed separately? A name will most likely, in my mind, be a more complex thing than just a string.

About Tom's vital events. It is clear that you need to separate one Burial event from another, but that can easily be done in a level zero event. I see no problem with creating an event record even if the info comes from my grandmother. Regarding extra overhead, I am not very concerned about that, I guess a few photos transferred together with the BGfile will in many cases be much bigger than the BGfile. But, this is an encoding issue, and it will probably be affected by a solution to the evidence-conclusion issue, so I suggest that we wait with this. I agree that the BIRT event may be a special case.

Geir

AdrianB38 2011-03-01T14:03:34-08:00

Geir - "5.5.1 talks, in EVENT_OR_FACT_CLASSIFICATION, about subtypes of ANY event type, using TYPE. The question is if this is used by anyone?"

It's certainly the way that user-defined events or attributes are supposed to be written - see discussion about Custom tags on Developer's meeting page.

I already added Requirement Syntax08 "It should be possible for user-defined events, properties, characteristics, etc, of individuals, etc, to inherit features from previously defined events, properties, characteristics, etc." and speculated / suggested "If events etc are given a type and sub-type, then it would be possible for the user to create a user-defined subtype of an application defined type, and thus inherit the processing done for that type.
For instance, an event "Marriage - civil" might have a type of "Marriage" and a subtype of "civil", thus automatically doing all processing created for the event-type of "Marriage" "

So a sub-type solution is already there in a fashion.

And yes, Name will require more complexity in structuring than the usual "attribute".

As for your looking in more detail at the GEDCOM Spec'n and confounding our expectations - yes, I think we're all doing some of that! Old saying: "If you think you understand what's going on here, you obviously don't..."

PS - I shall be sorry to lose your Trabant!

louiskessler 2011-02-25T20:48:16-08:00

... and GEDCOM is just suggesting that if something takes longer than a day, then it is *probably* a fact rather than an event. They do not impose it as a rule.

louiskessler 2011-02-25T21:11:33-08:00

Adrian said Today 2:57 pm:
re: Custom GEDCOM tags

Louis - re your statements "Events also having descriptions, e.g.:
"1 EVEN Appointed Zoning Committee Chairperson
"2 TYPE Civic Appointments"
and
"events can have descriptions (i.e. attributes), the presence or absence of an attribute cannot be used to define the difference between events and facts"

That's, ahem, "interesting". I just double checked GEDCOM 5.5 and the INDIVIDUAL_EVENT_STRUCTURE in that copy seems clear to me that it does not allow a description (i.e. attribute) for an event - not even the EVEN generic event that you quote. TYPE, yes, no problem with that.

Do you know if previous (or post 5.5) versions of GEDCOM relaxed this? Or is it simply a case of software suppliers trampling over the standard again? In a sense, it doesn't really matter either way because if there are files out there with that construction, we need to deal with them. But I'd still like to understand what's going on (my pedantic brain again).

Adrian:

The example:

"1 EVEN Appointed Zoning Committee Chairperson
"2 TYPE Civic Appointments"

was taken right from GEDCOM 5.5.1 page 48 in the definition of EVENT_DESCRIPTOR. It says:

EVENT_DESCRIPTOR:= {Size=1:90}
Text describing a particular event pertaining to the individual or family. This event value is usually
assigned to the EVEN tag. The classification as to the difference between this specific event and other
occurrences of the EVENt tag is indicated by the use of a subordinate TYPE tag selected from the
EVENT_DETAIL structure. For example;
1 EVEN Appointed Zoning Committee Chairperson
2 TYPE Civic Appointments
2 DATE FROM JAN 1952 TO JAN 1956
2 PLAC Cove, Cache, Utah
2 AGNC Cove City Redevelopment

Now go to the FAMILY_EVENT_STRUCTURE on page 32, and you'll see under it is:

n EVEN [<EVENT_DESCRIPTOR> | <NULL>] {1:1} p.48
+1 <<FAMILY_EVENT_DETAIL>> {0:1} p.32

But, if you look at INDIVIDUAL_EVENT_STRUCTURE on page 34-35, you'll see:

n EVEN {1:1}
+1 <<INDIVIDUAL_EVENT_DETAIL>> {0:1}* p.34

which does *not* have the EVENT_DESCRIPTOR. I am sure this is what you looked at.

However, the latter MUST be a mistake in the GEDCOM definition, because their own example they gave of being "Appointed Zoning Committee Chairperson" is not a family event, but it is rather an individual event.

I believe the proper interpretation should be that it was GEDCOM's intention to have the event descriptor on both INDI events and FAM events, because it simply doesn't make sense to have them on only FAM events.

Louis

ttwetmore 2011-02-26T07:10:27-08:00

I hesitate to point this out once again, but it seems that I may be the only one stressing the point.

The discussion above covers the LEVEL ONE GEDCOM events, what I call VITAL EVENTS to separate them from the TRUE EVENTS, which stand alone as evidence and may refer to multiple persons as role players.

Since vital events are level one entities within level zero INDI records they are easily viewed as simply as a more structured kind of PFACT than the simpler PFACTs like name, age, occupation. I can see that there is interest in defining exactly what is the distinction to be made between vital events AS PFACTs and other simpler PFACTs, but I fear this entire discussion simply takes away from the important question of LEVEL ZERO TRUE EVENTS.

I really don't think it matters whether LEVEL ONE VITAL EVENTS are considered just another kind of PFACT or not. I can't think of any major reason. If you decide you can give dates to all PFACTs (e.g., occupation, name, sex [considering the possibility of sex changes -- joke, joke, joke, please]) then the differences seem to be pretty moot. Would there be any real difference in the model, the file formats, the information content, the application software, if this distinction wasn't considered to be important. Couldn't we just say that persons have lots of different kinds of PFACTs, describe their different natures, and simply fit VITAL EVENTS in as one of those sub-types?

What must distinguish Better GEDCOM is the work to extend beyond the GEDCOM model into areas that make the new model compelling for the next generation of genealogical applications. In my view we have identified the two major areas where the model must be extended, to LEVEL ZERO EVENTS as first class citizens, and to full support for persons and events at the EVIDENCE and CONCLUSION/HYPOTHESIS levels.

Anyway, I guess it's interesting to discuss the differences between LEVEL ONE VITAL EVENTS and PFACTs in general, but please don't loose sight of where the true work lines, in the definition of the LEVEL ZERO TRUE EVENTS.

Tom W.

hrworth 2011-02-26T07:44:35-08:00

Tom,

Might I suggest that some of this information, mostly what is in Caps, be Documented on this Wiki. YOU know what all of these means, but I am not sure that others know, specifically, what you mean. That may be why you are answering the same questions frequently.

What is: Level One Vital Events
What is: PFACTs
What is: Level Zero True Events.
What is: Evident Level
What is: Conclusion Level
What is: Hypothesis Level

Seems to me, that these definitions need to be at an Overview of the BetterGEDCOM level on this wiki. I don't know exactly where, but at a high enough level on the wiki, so that we can see what they mean.

I know, I am asking as a simple End User, but don't know how they would show up in a GEDCOM file (today). If I could, then I would be able to find it in the software that I use.

IF, they are not in a current GEDCOM file, thats OK too, but how would I see them in a BetterGEDCOM file?

Thank you,

Russ

AdrianB38 2011-02-26T09:13:30-08:00

Louis
I surrender! (grin) In fact, I surrendered sometime last night when thinking about how I had entered people's inheritances. I did like the idea of an event being a change but couldn't square it with the bit about "values" needing attributes (to use old GEDCOM terms). Coming into an inheritance was, from various viewpoints,
- an event when considering the English definition of the word event;
- an event when viewed as a change of state (in monetary value or possessions);
- an event because it involves at least 2 people (the recipient and the deceased) and therefore needs to be what Tom refers to as a Level zero event (referring to what the GEDCOM _would_ look like if it allowed the concept);
- an attribute because I needed to access the field in my application program to put the description of the inheritance in.

3-1 to events. And then Louis comes up with the text from 5.5.1 that would release that field to an event as well as an attribute. 4-0 to events.

(Louis - thanks for guiding me to the relevant pages. It's all quite different from 5.5 and I'd made the mistake of believing the section on "Modifications in Version 5.5.1", where it mentions nothing of these changes.)

So - where does that leave things?

Pretty much as Tom contends, I think.

We have an English definition of "Event = a change in state" (sorry if that's a bit like a physicist's speak). This implies a definition of Attribute (to use the old GEDCOM term) as an ongoing state-description, but everything gets a bit vague when you consider short term attributes - for instance, you could consider Occupation as an attribute but only be in a job for a day.

Fundamentally, I see no reason to worry about the distinction between an attribute for a person and an event that applies only to one person - there's no serious distinction that I can envisage in the future BG Data Model. Which I think both Louis and Tom said above - but I needed to convince myself.

The most crucial feature is what Tom has stressed above (and sorry, Tom, I was taking it as read so much that I didn't mention it) - the need to accommodate Events that have more than 1 person. How do we identify these? In Data Modelling terms, it's easy - there is a many-many relation between these Events and People (I'm talking people but it applies equally to Families / Groups / Locations etc), whereas between the other sort of events OR attributes there is only a many-to-one relation from them to People etc. But I'm not sure how to explain it in English - or whether we need to if we can communicate with the software suppliers.

I'm half tempted to say that all things known as events should be regarded as the many-to-many type of events. Even the death event, for instance, which might be thought of as an event applying to 1 person only, might include a 2nd person (in the role of a doctor, e.g.).

I'm also now half tempted to revert to a position I held some years ago and say that in BetterGEDCOM events and attributes are the same thing after all, and the only thing we need to worry about is whether they are many-to-many or many-to-one to people etc.

louiskessler 2011-02-26T09:49:33-08:00

Adrian,

I've always used GEDCOM 5.5.1 and I also didn't realize the big change that they here from 5.5

To sum this up so that everyone see what changed, in GEDCOM 5.5 the EVENT_DESCRIPTOR on the TYPE tag, and nothing on the EVEN tag, i.e.:

1 EVEN
2 TYPE <EVENT_DESCRIPTOR>

GEDCOM 5.5.1 changed the actual meaning of the EVENT_DESCRIPTOR and moved it to the EVEN tag, and added an EVENT_OR_FACT_CLASSIFICATION to the TYPE tag, i.e:

1 EVEN <EVENT_DESCRIPTOR>
2 TYPE <EVENT_OR_FACT_CLASSIFICATION>

with the mistake I earlier pointed out of GEDCOM 5.5.1 leaving the EVENT_DESCRIPTOR off of INDI events.

But this leads us to another minor issue, which should be agreed on (maybe in another thread):

Should we mostly refer to GEDCOM 5.5 (the standard) or GEDCOM 5.5.1 (the beta) or enumerate both as the starting point of BetterGEDCOM? As we're pointing out, there were many subtle changes between the two.

Personally, I prefer GEDCOM 5.5.1, because that was the intended direction that GEDCOM was going, and many programs did adopt it. For my program I take 5.5.1 and extend it so that dropped features from previous GEDCOM versions can also be handled.

Louis

gthorud 2011-02-26T11:48:43-08:00

Seems like a consensus is emerging.

The following will be repeating what has already been said, but just to sum up my understanding.

We have two types of vehicle that can transfer info
- One small and simple one that has traditionally been called attributes, which are single predefined values or (user defined) type and value pares. They apply to one record (person, place, group etc) and there may me one or more of the same type.

- One big and complex one that has traditionally been called events, that should be a top level record, that can carry info that refers to zero or more person names/group names with various roles, zero or a few dates or a period, zero or more place names (possibly also identifying an address/contact structure), zero or more values (preferably predefined types), zero or more notes of various types, zero or more citation references (not determined at what levels in the event structure), zero or more multimedia references, and has administrative info (references to research process info, selection flags/marks etc.) The conclusion variant of this vehicle should have several pieces with surety info at various places in the vehicle.

There is no rule for what types of info that can be transported by these vehicles, except that types that may AT SOME (FUTURE) TIME not fit into the small one, must use the big one.

There are no rules about time (except that the small vehicle can not transfer dates– the time/period associated with the types of values in the small one is undefined or implied by the type).

The info types that can be transferred are defined in the standard, or a central registry or can be user defined, where the ONLY vehicle type to be used for an info type is specified, together with the roles, subtypes/classes and value type(s) (and possibly more). User defined types should be described in the file header for both vehicle types.

The problem seems to be what we should call them (perhaps Trabant and Rolls Royce). The names should not imply anything about what type of info they can transport.

The evidence-conclusion issue is orthogonal to this discussion, ie it should be another dimension – I hope.

Many of the lists of info types described above, carried by the big vehicle, is not central to the discussion in this topic.

Re. Gedcom 5.5 or 5.5.1 Agree that there is a problem, but I think, when needed, a reference should specify 5.5 or 5.5.1 – it is difficult to choose. The best thing would be to copy the text so many readers don’t have to do a lookup.

AdrianB38 2011-02-26T12:26:22-08:00

Re GEDCOM 5.5 or 5.5.1 - I wouldn't get hung up on it. We are, after all, trying to define our own data model and standard. If we say "GEDCOM says / doesn't say X", then we should qualify which version of the standard says it. Otherwise, our work should stand alone.

Geir - After getting my head round your vehicle / Trabant / Rolls Royce analogy - which I rather like! - I am slightly worried about what you mean by "There are no rules about time (except that the small vehicle can not transfer dates– the time/period associated with the types of values in the small one is undefined or implied by the type)." Our small 'construction' can transfer genealogical data containing dates. Someone's name is a classic instance of the simple data that fits into that simple 'construction'. For most men in the UK, it will be the same value through their life - an implied time period "GivenName FamilyName From Birth to Death". But there will be lots of women whose name changes during their life, so will be (e.g.)
"GivenName FamilyName From Birth to Marriage1"
"GivenName AnotherFamilyName From Marriage1 to Marriage2" etc
(where Birth, Marriage1 etc represent dates.)

ttwetmore 2011-02-26T13:52:09-08:00

What is: Level One Vital Events -- level one GEDCOM structures commonly called events today (e.g., BIRT, DEAT, BURI, BAPM, ...).
What is: PFACTs -- level one GEDCOM lines or structures that hold properties, facts, attributes, characteristic or traits.
What is: Level Zero True Events -- level zero GEDCOM records (see Event GEDCOM for examples) that hold event info with references to multiple INDI records.
What is: Evidence Level -- Records that hold evidence information.
What is: Conclusion Level -- Records that hold conclusions/hypotheses.
What is: Hypothesis Level -- Records that hold conclusions/hypotheses.

gthorud 2011-02-26T13:58:14-08:00

Adrian,

I should perhaps change the first sentence in my design of a Trabant into “One small and simple one that has traditionally been called attributes, which are predefined standardized, centrally registered or user defined type and value pares.

I am thinking about the structure that some people currently use to transfer hair color, caste, eye colour, nationality etc. although some of this info may change over time – something the Trabant can’t transport.

There are other structures that carry names, and I am not sure if there should be dates in that structure – maybe better handled by events when there is a change.

Tom,

I don’t understand why there must be Level One Vital Events, why can’t they all be Level Zero True Events?

gthorud 2011-02-26T15:04:12-08:00

Why can't we simply keep the words used in Gedcom, Attribute and Event, the Trabant and the Rolls Royce, and define them the way we want by modifying any Gedcom definitions to suite us? Choosing other words will just create confusion. And, I would like to get rid of the term PFACT, it is creating confusion.

ttwetmore 2011-02-26T17:15:03-08:00

If there is one thing I've learned in over 45 years of technical work, no matter what we do there will be confusion over terms, and there will always be new people getting involved who want to start all terminology discussions over again. It's part of the posturing procedure inherent any time human beings come together to get work done.

The term PFACT came about because there were different people on this Better GEDCOM, using all the terms of property, fact, attribute, characteristic and trait, all to mean the same thing, but not realizing that everyone was using their pet terms to mean exactly the same thing. I came up with the term PFACT as an attempt at a preemptive strike at forestalling months of confusion and argument over terminology. It was a a cute way to help avoid problems. It we are all mature enough now that we can retire PFACT and replace it with one of those synonyms (I sense that attribute might be the winner), that would be great. But I guarantee as soon as more people join this effort all those other terms will crop back up and confusion will reign once more.

Tom

AdrianB38 2011-02-25T14:50:16-08:00

This arises from the discussion on the page for Custom GEDCOM Tags. (Oops - just checked - that's a discussion of that name on the page for the Developers Meeting). It is be sensible to give this important topic its own discussion.

Firstly note that the Requirement does NOT propose a definition of the difference - just proposes that there should be one.

Secondly note that it is not sufficient just to say Events are (one list of things) and Properties / Attributes etc are (another list of things) since this does not help when creating a user-defined "thing".

Thirdly, using the English language as our definition (e.g. "Oh, we know what we mean by the word 'event'") is not helpful to those of us with a different language.

Finally, the two concepts are not that far apart - if I were creating an object oriented program to update some family history related objects (not that I could) then I suspect that both event and attribute objects could inherit a lot of "stuff" from a common object. So we don't, I suspect, need to worry about the difference immediately.

AdrianB38 2011-02-25T15:02:16-08:00

Some important clips from previous discussions:

louiskessler Tuesday, 12:40 am
...
There is an EVEN (Event tag) which describes a change that happens at some time,
and a FACT tag which describe something that is true over a time period.
Most other tags are simply descriptions of one of these, and will be the data for the TYPE tag under the EVEN or FACT, e.g.
1 EVEN
2 TYPE Graduation

AdrianB38 Tuesday, 10:59 pm
Louis - re your statement
"There is an EVEN (Event tag) which describes a change that happens at some time,
"and a FACT tag which describe something that is true over a time period."
We've probably had this discussion before (grin!) but while your definition _tends_ very much to be true, we can concoct a definition of the difference between event and attribute (to use the GEDCOM terms) that leads to events happening over a long time.
This is even more true if we go for the concept of an event affecting multiple people while an attribute only applies to one person.
Specifically:
- an attribute must have a value (not one of the existing place, date, etc)
- if something has a value then it's an attribute
- an event must not have a value
- if something doesn't have a value then it's an event
Using this definition, "World War One" qualifies as an event and it clearly lasts for several years. It also affects a number of people, so that's another good reason to take it as an event.
Also, Residence qualifies as an event since the so-called value it usually has is PLACE, which is already present, so it doesn't actually need this "value" item. Some GEDCOM type programs get themselves in knots over Residence because they say "Residence is an Attribute - but unlike every other Attribute it doesn't have a value"
I much prefer my definition of the difference between Event and Attribute because it can be precisely described with no exceptions.
But it's also important to realise that many facts can easily be represented in either fashion depending on whether you bring things like Cause-of-event and Responsible-Agency into play.... So it probably shouldn't cause us too much grief too soon.

louiskessler Today 6:43 am
...
You left out some things that make the analysis even tougher, such as Events also having descriptions, e.g.:
1 EVEN Appointed Zoning Committee Chairperson
2 TYPE Civic Appointments
and that the TYPE event descriptor can be applied to defined events, e.g.:
1 MARR
2 TYPE Common Law
Basically, the TYPE can be any text the user chooses, and GEDCOM states it should be displayed as given.

louiskessler Today 6:53 am
... and since events can have descriptions (i.e. attributes), the presence or absence of an attribute cannot be used to define the difference between events and facts.
The true difference, is that an event denotes a change of something and when that occurs. A fact indicates a truth that exists and the period of time during which it is true. From GEDCOM:
"As a general rule, events are things that happen on a specific date. Use the date form ‘BET date AND date’ to indicate that an event took place at some time between two dates. Resist the temptation to use a ‘FROM date TO date’ form in an event structure. If the subject of your recording occurred over a period of time, then it is probably not an event, but rather an attribute or fact."

ttwetmore Today 10:44 am
The quote Louis provided about dates in events is a good guideline. However, I believe it is still reasonable to allow events that occur over a range of dates, so don't believe the quote should have been so strongly phrased.
Examples of events that take more than one day would be a trip, eg, an ocean voyage when immigrating. Yes, of course, you can add two events, a departure event followed by an arrival event, which might be the recommended course, but why disallow an event for the voyage as a whole.
...
How about a multi-day ceremony? How about a vacation? GEDCOM should be an important guide for BetterGEDCOM, but all of its assumptions are fair game for re-examination.

ttwetmore 20 minutes ago
Just ole opinionated me again. Adrian's last about the difference between an event and a characteristic brings up an important (IMHO) point that I have tried to cover in the DeadEnds model.

What is an event in GEDCOM? It is a substructure of lines inside a person record (or family record) that describes a date and place and maybe some other information about an event that occurred in the life of the person (let's forget families for awhile). These are SINGLE ROLE events that conveniently forget that a birth event really involves at least three persons! Thinking about events in this trivial way has gotten so commonplace that it has nearly completely hidden from view what events really are. A role player is NOT MENTIONED in these events because the substructure is inside the record of the event's PRIMARY ROLE PLAYER already. These "events" serve PRIMITIVE genealogy well, the simple quest for birth and death dates for direct ancestors, but they are inadequate for serious genealogy. GEDCOM came out of the LDS's goal to perform temple rites on church members' ancestors, not from anyone's goal to have GEDCOM support serious genealogy. We should not be surprised that GEDCOM is only suited for the fairly simple requirements of the church faithful.

In the DeadEnds model I call these events "vital events" and keep them as substructures inside person and other records. Thus converting a file from GEDCOM format to DeadEnds format does not cause an explosion in the number of now independent event records with single role players. In some sense these "vital events" are really like structured PFACTs and I think can be treated as such. You just have to think of a birth date and place as structured but still pretty simple PFACTs about a person.

But then there are "real events." GEDCOM doesn't have them as first class citizens, but some systems now do. These events are full-fledged, top-level, first class citizen, records, with their own record identifiers, indexed as any record would be. These are true multi-role entities that refer of to the person records of the person who play the roles in the event. The DeadEnds model brings this record type front and center right along with person records, and I certainly believe strongly that the Better GEDCOM model should do the same.

So, the bottom line. In thinking about events, one must be pretty careful to know what one is talking about. I hope I have explained the differences between the two ways that the term event is used most commonly in the genealogical context today. When deciding how to include events in the Better GEDCOM model it is important to know which of these concepts is the one one is discussing.

Tom W.

AdrianB38 2011-02-25T15:16:29-08:00

Can I start the discussion with one important point?

While a lot of genealogical events do happen on a single day, and a lot of genealogical events do mark a change in something, I think we must allow that a lot of events that affected our relatives took place over a number of days - years even. For instance (assuming we allow only the 2 concepts of event on the one hand and property / characteristic / whatever on the other) then World War 1 is clearly an event. The American War of Independence is clearly an event. Etc. There is no way that these can be described as properties - and I don't think any of us would pretend that they can.

If we are going to go for the concept of multi-person events in BetterGEDCOM, then I think we are going to see an increase in the number of multi-day events, which makes definitions involving single days somewhat deficient.

louiskessler 2011-02-25T20:46:58-08:00

Again, the proper definition of an "Event" is something that is the change between one state and another state. It does not have to be a point in time, and can be a period of time, even years.

A "Fact" is a "truth" that is correct for a period of time.

An event will be at the beginning and the end of a fact. Before the event will be another fact. After the event will be another fact.

Event - Fact - Event - Fact ...

You don't always list them all or care about them all, so only the events or facts of interest are the ones you denote.

e.g. 1:

Fact - John hasn't been born. Before Jan 1, 1950.
Event - John is born. Jan 1, 1950
Fact - John is bald. Jan 1, 1950 to June 30, 1950.
Event - John's hair is growing. July 1, 1950 to Dec 31, 1951.
Fact - John has a full head of black hair. Jan 1, 1952 to Dec 31, 1979.
Event - John's hair is falling out. Jan 1, 1980 to Dec 31, 1999.
Fact - John is bald - From Jan 1, 2000 on.

Now if you want, you could turn each Event into a start event and an end event with a fact in between, e.g.:

Event - John's hair is falling out. Jan 1, 1980 to Dec 31, 1999.

can be:

Event - John's hair starts falling out. Jan 1 1980
Fact - John's hair is falling out. Jan 1, 1980 to Dec 31, 1999.
Event - The last lonely little hair on John's head falls out. Dec 31, 1999

So I don't think it's rigid, and there is even ambiguity as to whether something is an event or a fact. I've in the past thought the two were so similar, that there is no reason to necessarily have two different objects, but maybe just an event/fact. But I don't really care about that.

e.g. 2 (Adrian's other example):

Fact: Period prior to World War I
Event: World War I
Fact: Period after World War I

I see no problem with that. Or even with this:

Fact: Period prior to World War I
Event: Start of World War I
Fact: World War I
Event: End of World War I
Fact: Period after World War I

Why are these both okay? Because an event is a transition. World War I was a transition from the time before the war to the time after the war, so it was an event.

But it was also true that World War I was happening between the start of WWI and the end of WWI. So WWI was also a fact.

I hope noone is getting too hung up on this. Not everything has rigid rules.

gthorud 2011-03-07T11:40:15-08:00

Administration01 - Research Administration Information

Overall requirement: BetterGEDCOM must allow recording of administrative information needed to organise the research work.

Please add your ideas, requirements, pointers to examples from programs etc here. Just a short overall descriptions so we can get a better understanding of the area. Detailed requirements will be added to the Requirements Catalog after that.

The Gentech model covers this type of information.

gthorud 2011-03-08T09:59:45-08:00

I have had a quick look at the Gentech model which contains some entities related to research administration. If you look at the diagram accompanying the main document, you should get a rough understanding of the entities.

See info about the model here http://bettergedcom.wikispaces.com/GenTech+Data+Model

I will try to give a very rough summary of the main entities, but it is very likely that I have not understood everything correctly since I cannot spend a lot of time on it. The whole thing refers to a model of the research process that is also documented in Gentech.

The entities are:

- Objective: A Research Objective can be for example “Find the father of John Smith”. It can have a name, description, priority, sequence and status.

- Project: Research Objectives can be grouped and linked to a Project, for example “Find the ancestors of Peter Smith”. It can have a name, description and client data.

- Activities: Each Research Objective can have several Research Activities, that can be a Search or an Administrative Task. An activity can relate to several Objectives. An Activity can record can have scheduling info, status, type, description, priority, comment and a link to a Researcher.

- Source: A Search can be linked to a Source (and a source can have several Searches linked to it). (A search can link to a Repository Source and a Repository and have a status “finished”.)

- Repository: A Repository can store several Repository Sources, e.g. a Book. (There is a hierarchical model for Sources which I will not describe here.)

- Source Group: A Source Group could be used to Administrate (manage) Sources, e.g. sources related to a particular topic e.g. “Sources about Boston”.

- Researcher: A Researcher can be linked to several Projects and several Activities. Name and “contact info” is stored for the Researcher.

- (A project can have several Surety-Schemes used for Assertions.)

In summary, important things to note is the structuring of a To-do-List into Activities (Searches), Objectives and Projects which may be linked to Researchers. And there is administration of sources.

I note that there are bits and pieces of information in programs that are not mentioned in this model.

It seems to me that the full model tries to cover the needs of a professional researcher. Do we need to cater for the needs of professional researchers, and is this useful to them?

A question is if the terminology of this model can be used as a common reference terminology? But I would not be surprised if there are other terminology standards in this area.

The model could be implemented in several ways, so I will not go into that.

Assuming that the main use of Administrative info in BG is to exchange info between one user’s programs, a thing to consider is how information in this model can be converted into programs implementing simpler models, and what the consequences are if some of the info is discarded. If a conversion is possible, it should be possible to use programs with little or much functionality/data for administration.

gthorud 2011-03-09T08:12:20-08:00

I have looked at RootsMagic.

There can be a Todo-List of Tasks for every Person or Couple (family), and General tasks not attached to any of these. All Tasks can be listed in ToDo list.

Each Task has a name/description, a Personal file number, priority, status, dates, link to an address or repository and a large field for Description and Results which can be formatted.

Could not find any additional filtering, sorting or any links to events, and I don’t see any way to list only the Tasks for a Repository.

Again, I am not sure I have found all the functionality.

Geir

gthorud 2011-03-09T10:26:28-08:00

I have looked at Legacy (DeLux). I have used a translated help file which could lead to errors.

There is a To-Do list with To-Dos (Tasks, I’ll use that term) for Persons and General tasks.

Each task can have a name/short description, category (select from configurable list), locality (Where to perform task, select from configurable list), dates, status, type (research/correspondance/other), priority, citations (which is in itself a general structure, that can eg. be linked to repository) , file ID (for filing cabinet), description (full. formatted text), result (formatted). Each Task can be linked to a repository/address with contact info.

The list can be filtered on many of the task fields and can be sorted on multiple fields.

I could not find a way to list tasks for a repository, but then there is “locality”.

Again, I am not sure I have found all the functionality.

At least one more prog to go ….

gthorud 2011-03-09T15:35:31-08:00

I have looked at Genbox. It seems to implement most of the Gentech model, and more, related to administration.

Searches (Gentech term) is the term used for Tasks (I will use Tasks). Tasks are scheduled after first defining a Target (Gentech term = Objective). Targets can be linked to (sub) Projects. Projects can be split into a hierarchy of sub projects.

There is a correspondence log (phones, letters, email etc) where each item can be linked to (sub) projects. Researchers (e.g. you or cooperating researchers) can be registered with contact information and is identified in eg. assertions.

Targets can be defined for a lot of information types eg. persons, events, families, person names, parents, sources, places and sources. There can also be general targets.

Content of the various records:

- Tasks (Searches) can have description, dates, priority, location, findings and can be linked to a source and repository.

- Targets have a description and are linked to searches and the information types mentioned above.

- Projects and sub projects can have name, dates, status, priority, number of hours used, completion grade (%) and description. They may link to higher level projects, targets and correspondence log items.

- Items in the correspondence log can have type (call, email etc), in/out, researcher, correspondent, subject, date, ref to filing system and details about the correspondence. There is also contact info (addr, phone etc).

- Researchers can have name, languages, registration number (?), notes, media (photo) and contact info. A researcher can be linked to a person in the database.

Lists and reports can be created for all main records, including Projects, Researchers, Targets and Tasks, and they can all be filtered and sorted on various criteria.

GeneJ 2011-03-09T17:39:12-08:00

This date I reviewed the FamilySearch Wiki for "Keeping a Research Log."
https://wiki.familysearch.org/en/Keeping_a_Research_Log

Note: This is a log that keeps track both of what you plan to search AND what you have searched.

(1) Objectives.
*Keep track of and be able to share what you have searched (helps avoid duplicate effort)
*Log provides a record of what you have done if you have to return to a source for further work.
*Provides a more complete record of your work.

(2) What is recorded in the log BEFORE you search:
*Name of Ancestor
*Research Objective
*Source Title, call number, microfilm number, book number, etc.
*Place where the source is available

(3) What is recorded after [or during] the search:
*Date(s) you searched
*Notes about what you found/learned
*Notes about what you didn't find
*Whether you made a copy

GeneJ 2011-03-09T17:55:24-08:00

Err... there is a related Wiki at FamilySearch, so I'll summarize it, too.

https://wiki.familysearch.org/en/Research_Logs

Part I
Value:
Cite your sources; sort out what has and has not been found; organize and correlate copies of documents; weigh evidence/better conclusions; show strategies and record questions; reduce duplication of effort

Contents (says, "following elements work well for most researchers")
*Ancestors Name/life span
*Researcher's name
*Date of search
*Place of Search
*Purpose (objective) of search (event and person)
*Source Description [they show the call number and Document numbr separately, I just lumped that in with source]
*Results - a summary of what was found

Part II
What to complete in anticipation of a search:
*Date
*Place of Research
*Purpose
*Source Description

What to complete after a search:
[This wiki has comments, but the comments seemed a little jumbled. Near the bottom of the wiki is says, "Write lots of notes to yourself explaining your strategies, analysis, conclusions, questions suggestions, and discrepancies. "

GJNote: This form of research log, in software or electronic form, can be used to record snippets or full transcriptions from sources, together with your own comments and notes.

gthorud 2011-03-10T06:58:43-08:00

It is easy to see that, as Adrian has pointed out, there is a big span in the functionality implemented in various programs. It will be difficult to convert all information from the most advanced programs into the programs that provides minimum functionality. However, it seems useful that when a user changes from a complex program to a simpler or similar program, it would be better to get some of the information across and perhaps in a non-optimal structure, rather than the current situation where no administrative information can be transferred. Although one should try to prevent it, it would be acceptable to loose some types of information and the user may even have to do some tidying up after import (?). Since there is usually only one user involved, requirements to preservation of the exact meaning and structure of the information is less important than if the information was transferred between different users.

There are differences in the overall structure of the information in programs, from a simple task list to a task-objective/target-project model with many entities. Thus types of information may be attached to different entities in the structure, but in many cases meaning the same thing. Also there are some programs that have a one to one relation between entities, where others have a one to many or many to many.

One observation that might be helpful in a conversion process is that all? programs have one or more large text fields, and all? programs have Tasks. Information from highly specified fields or higher level structures could be converted into these text fields. The structural incompatibilities could be solved by “flattening” the structure (a term used in data conversion). For example consider a program that has the Objective “Find my grandfather’s birth date” and two tasks “Check census x” and “Check the parish records for y”. This could be converted into the text fields of two tasks:

Task 1:
Objective: “Find my grandfather’s birth date”
Task: “Check census x”
Task description: Find xxxx

Task 2:
Objective: “Find my grandfather’s birth date”
Task: “Check the parish records for y”
Task description: Find yyyy

The example can be extended. Assume that the Objective is linked to the record of my grandfather Ole Olsen, and to repository X.

Task 1:
Person: Ole Olsen (ref # 1234)
Objective: “Find my grandfather’s birth date”
Task: “Check census x”
Repository: X

Important info could be placed at the top and less important info at the bottom of the text field. The exact positioning must be determined by the implementer of the program (but could in THEORY be user configurable). The implementer will also have to balance the complexity of the note against discarding information.

The “flattening” may also handle eg. three levels into one, or project and activities into activities only.

Such a conversion must be done by the importing program since it is the one that knows the data file structure and it’s internal structure. The complex structure can therefore be represented in the BG file.

I will try to look into grouping of the program functionalities into sub-requirements, but it will not be easy, and may take some time.

GeneJ 2011-03-10T09:44:57-08:00

How can I help?

GeneJ 2011-03-10T10:04:58-08:00

Putting a pitch in for Research Administrative support at the source level.

Task Name: FHL Film 634021, "Anywheresville Birth Records, 1650-1910."
Task Description: Access filmed records
Description: Subject film cited as source of the source for Ancestry's database "Index to Anywheresville Births."
Notes/Comments: XXXXXXXXXXX

gthorud 2011-03-13T12:43:00-07:00

I have gone through the functionality in the programs etc. mentioned above and tried to group it into Reuirements. Thr important thing in this step is to capture all the possibilities, so one question is – Have I forgotten anything important? Although I have tried not to prioritize, there are some differences in the wording indicating a priority in some cases. I have followed the Gentech model. The plan is to copy these requirements to tha req. cat.

Comments are welcome.

Research Task
BetterGEDCOM shall be able to record a Task (search or other task) that needs to be done or has been done. Information recorded about the task itself could be a Title/Short description, a full description (formatable). Research tasks can be organized in simple lists or grouped into Objectives, see below.

Task information
BetterGEDCOM shall be able to record information about a Task, for example used for Categorisation (keyword, category, type (research/correspondence/other)), Progress management (priority, staus, dates. comments about dates), Resource use (Expences, number of hours used)

Identification of persons, events, places that the task is about
BetterGEDCOM shall be able to link a task to records representing the person(s), event(s), place(s), source(s) etc. that the task is about, existing when the task is defined (started). A possibility is also to record links to persons, events etc. that are created as a result of the task.

What to search
BetterGEDCOM shall be able to record information about, or link to records representing, WHAT to search – e.g. a source. Possibly an URL pointing to the source.

Where to do the task
BetterGEDCOM shall be able to record information about, or link to records representing, WHERE to do the task – Location name (if not linked to), Repository, Place (eg. cemetery), Address

Task results
BetterGEDCOM shall be able to record information about, or link to records representing, the findings and results produced by the task (an overall description of the results, Excerpts, Multimedia, Citations, Filing Cabinet Reference)

Objectives for grouping of tasks
BetterGEDCOM should be able to group several tasks into Objectives (Target) , each Objective representing a question to be answered or a problem to be solved. An objective is usually defined before the tasks needed to achieve the objective. Objectives should have a description and will be the record pointing to users, events, places etc rather than each task. Some elements of the information recorded for tasks (see above) can be defined for the objective rather than each task,

Projects for grouping of objectives
BetterGEDCOM could be able to group several objectives into projects. Projects could be split into sub-projects. Each (sub-)project should have a name, elements of task progress listed above, completion grade (%) and description.

Correspondence log
BetterGEDCOM could be able to record information about letters, emails, phone calls or other correspondence related to the research. Item in the log can have a type (call, email etc), direction (in/out), researcher, correspondent, subject, date, reference to filing system and details about the correspondence. Contact information (address, phone etc) could also be recorded..

Researchers
BetterGEDCOM could be able to record information about the researchers using the program or other cooperating/corresponding researchers. Researchers can have a name, languages, registration number (?), notes, media (photo) and contact info. A researcher can be linked to a person in the database. The Gentech model also links researchers to assertions, i.e. who made the assertion.

GeneJ 2011-03-13T12:58:28-07:00

It looks great. Thank you for this. --GJ

AdrianB38 2011-04-11T08:52:49-07:00

Please note my new page on
http://bettergedcom.wikispaces.com/Research+Process%2C+Evidence+%26+GPS

This endeavours to concoct a research process and, in the process, defines the data that could be stored in a BG compatible database.

theKiwi 2011-03-07T12:24:45-08:00

The screen shot I mentioned during this afternoon's meeting can be found here

http://lisaandroger.com/MiscImages/ReunionLogs.png

It is from Reunion for Macintosh, in which "Logs" have been a feature for many versions up to the current Reunion 9.0c.

The Logs are stored in the Reunion database, so each datafile can contain its own set of Logs.

I can create as many different Logs as I want to using the "Add Log" button

The other buttons provide access to other features as named, including for example to export a Log to a word processor file, or a text file, and perform a search to find text in any Log.

The text of the Logs can be formatted using the buttons at top right - Plain Text, Bold, Italic, Underlined, and coloured.

Nothing stored here in this part of the Reunion database can be exported to a GEDCOM file.

A very minimum requirement of BetterGEDCOM would be to support these simple text data chunks, presumably as type NOTE.

An extra would be to allow the passage of text formatting as has been mentioned in other places in the Wiki, using for example HTML tags to style the text, although this is a function of the exporting application in reality.

Further enhancements might be to allow for a set of structured fields for these Logs, so that you might record in separate field more pertinent information such as

URL to use for doing the listed research
Name and Address of Research facility
etc

so that perhaps a structure similar to Sources might be developed/supported?

This would allow if the genealogy application supported it, one could get a list of all logs that have work to be done at "Library of Michigan" for example if you created more than one Log for that, although as my screen shot shows, in my (admittedly not very rigorous) use of this I've created different log for different sites and listed all the items I was thinking of at that time in there.

louiskessler 2011-03-07T13:44:13-08:00

I believe that ToDo lists should be Repository-based. Every item should be assigned to one or more repositories. That's how I'll eventually implement them in Behold.

Every trip, you'll know which repositories you're going to, the people you'll be visiting, the cemeteries you'll be exploring, etc. So you will want a list of what to do at each of these places organized by place.

theKiwi 2011-03-07T13:57:37-08:00

Every trip, you'll know which repositories you're going to, the people you'll be visiting, the cemeteries you'll be exploring, etc. So you will want a list of what to do at each of these places organized by place.

I don't think this is necessarily true - a To Do list item could be as simple as

"Find William Moffat's parents", the solution to which might be found at the first repository you think to check (if you're lucky), but which might not be found until the 5th place you think to look, and a lot of time can be solved without going anywhere outside of "the internet".

Or as Adrian notes, these lists can be used for other things, like a Log of all of the people with the same surname in the same village that you'd like to further research to find out if they're related, or at the least a note to yourself to remind yourself to investigate this at some stage.

louiskessler 2011-03-07T15:10:40-08:00

Roger,

So I've got 200 things to do. I've got them all on my written ToDo list that is 8 pages long. I'll go to the store and take the 8 pages. I'll search through the 200 things for the 20 things I need to buy at the store. I'll do that every aisle I go down. I'll read those 200 things 40 times. When I'm done, I'm still not sure I got everything.

Now I'll go to the Post Office. Find the two things I need to do on the list. But I forget to go to the drug store that is right next door, because I didn't think of it because I was thinking of the other 170 items I needed to do.

The bottom line is, you ABSOLUTELY POSITIVELY NEED to organize your tasks by where you need to do them.

Why not just add your 300 genealogy things to do to the the other 8 pages of 200 physical things to do. Then you'll have 500 things to look through every time you do any errands.

I shouldn't give away my trade secrets. I can't believe nobody's ever seriously thought about this before. :-)

Louis

gthorud 2011-03-07T15:55:37-08:00

Although I don't mind discussions, it might save you some work if you wait with the discussions on specific functionality until we have created some more detailed requirements. I try to see the larger picture of possibilities before separating them into more detailed requirements (and possibilities). Keep posting suggested functionality - VERY detailed aspects may not be needed at this stage - but should be presented later.

GeneJ 2011-03-07T21:37:43-08:00

Hiya -

I uploaded screen shots of the _The Master Genealogist_ Research Log input screens.

http://bettergedcom.wikispaces.com/TMG+Research+Log+Images

GeneJ 2011-03-07T21:51:52-08:00

Also uploaded screen shots of plan/tasks input screens for Family Tree Maker for Mac.

http://bettergedcom.wikispaces.com/FTM+%28Mac%29+Task+Input+images

Hope this helps. --GJ

AdrianB38 2011-03-08T02:55:06-08:00

Let me for once make a suggestion that we need to think what we _don't_ require. It's quite clear that people prefer to organise their to-do lists on a very personal basis. The question is - how much does _another_ researcher need to know about my to-do lists?

They don't, I suggest, need to know about my planned visit to Bristol that will include visits to X, Y, Z, and the tasks at each of those places. Hence, why should it be in a BG file?

What might very well be useful to them (and therefore in the BG file) is the precise objective of those tasks - e.g. investigate if X is my 4G grandfather of the same name.

For this reason, I'm kind of underwhelmed by the prospect of to-do lists in any sophisticated form. We can come to an agreement (I hope) about the data model for genealogy - I'm far from clear we can agree more than some very basics for to-do lists (and a lot more besides).

Our criteria should be - I suggest - what might _other_ people find of use?

theKiwi 2011-03-08T04:09:23-08:00

@Adrian - the ability to move EVERYTHING is very important to individuals wanting to transport their data to themselves.

Over the years I have looked at other (non Reunion) Macintosh software as it comes along, but have immediately lost interest when I find that the error log of items not imported runs to (many) thousands of lines.

So at least sometimes, we are our own "other" people.

theKiwi 2011-03-08T04:13:18-08:00

I think I've figured out how to include an image here now...

gthorud 2011-03-08T06:46:20-08:00

I have looked at TMG. I have never used these features so what follows could be inaccurate.

There is a Research log which is an advanced ToDo-list. Each Task in the list has a Task name, various dates related to the progress status of the task, for each date there can be a remark, keywords, a value for expenses and a larger description field which can have various types of formatting.

Each task can be linked to zero or one of the following: Person, Event, Source and Repository. General tasks are not linked to any of these. Links to the log are available on each of the mentioned entities, so you can enter a task for eg. the current event, and see if there are any uncompleted tasks registered for the event. You can find the tasks that need lookup in a source or repository.

You can filter the log on a specific entity, so you can see the tasks related to a person, a repository, source, event, surname, keyword, task name or location (I am not sure how location works) and a list of tasks can be sorted on various criteria. Depending on how you organize the use keywords it could be used to group tasks for example related to a specific problem or area of work, a fellow researcher or whatever.

There are also reports for Task and the other related entities, I have not checked them out.

Hope I have not missed to much …

Geir

AdrianB38 2011-03-08T08:42:12-08:00

Kiwi - moving your own data between different apps is clearly important and a good answer to my question about how much does "someone else" need to know.

However, I fear (rightly or perhaps wrongly) that the likelihood of 2 apps sharing the same data model for non-trivial stuff outside the pure documentation of genealogy (i.e. scope of GEDCOM now) is slim.

Your Log doesn't look particularly sophisticated (sorry!) - a name and some formatted text, so that might very well export and import OK. But Geir's TMG Research task is an altogether different beast. It's perfectly possible to envisage TMG exporting all that into a BG file, either into tags that BG define or into custom tags. In fact, I'd say that exporting it into custom tags would be easy in XML or JSON. (Yes, I am making a wild statement but if I can do a generalised export from a 1970s technology mainframe, surely you PC guys... <grin>) But when it came to importing the TMG Research log into Reunion... Is Reunion going to implement TMG data structures? I think not. The non-genealogy stuff is how they differentiate their products, so I'm not optimistic we'll even find out what's behind the scenes.

Let's not get too pessimistic here - we can collect info about what people's software does, as Geir requests, and see if there's a simple basis that's common across a number of programs. But I think we need to remain a bit grounded in our expectations in this area.

AdrianB38 2011-03-07T12:20:12-08:00

FamilyHistorian enables me to create annotated lists where the lists can consist of any entities of any entity type. These can be used as hit-lists for to-dos; lists of canal boatmen (i.e. people with some common theme); lists of people who I know aren't relatives but who are in my Database because they were associated with my relatives (just saves me trying to find a family link when I've forgotten why they're there); lists of people to construct reports and diagrams on.

gthorud 2011-03-24T19:07:38-07:00

Data-Event02 - Multiple places per event

Description: BetterGEDCOM should support the recording of multiple places for a single event.

Why: Current GEDCOM allows the recording of one place for events. There are application extensions to record more than one - e.g. FamilyHistorian records two places for emigration - a "from" and a "to" place. Users may also define "Journey" events, where a "from" and a "to" location would seem natural.

Way forward:
•Analyse whether there is a need for more than two places per event - e.g. "from", "to", "via";
•Analyse whether location-roles are mandatory, optional or forbidden. (Location-roles refers to the role that a location plays in an event. Examples of roles are "from" and "to". Locations without roles would be just listed, e.g. "The 1906 earthquake happened at X and Y")
•If roles are needed - what are the roles?

gthorud 2011-03-24T19:19:32-07:00

There are some questions that must be answered regarding multiple places for events.

What are the event types where several places could be used?

So far emigration, immigration and earthquakes have been mentioned, but it also applies to all sorts of migration (i.e. “moved”), journeys or military expeditions. All of these may have one or more “via” places, in addition to starting point and destination.

Then there may be events used for places only. One such event would be when a place is created by splitting one place and creating a new one, or a (part of) one place merged with another. The most frequent variant would involve for example farms7properties at the same level in a place hierarchy, but would also apply when a place is “moved” from e.g. a country to another (higher level involved). There may be many such events for a place.

Some document could contain a place name that you are not sure where is, it could be this place or that place, so the event would reference both.

It might also be that multiple places could be used to reduce the amount of robot language, e.g. when person owns three properties, instead of having three sentences for this one could say that “Peter owned X, Y and Z.” But this is just a vague idea that I guess a lot of people might argue against.

There is a question if there are other event types where multiple places could be useful? Are there documents that mention several places?

Each place should have a role, e.g. starting point, destination, via-place, separated from etc. or the role could be implied by the event type if only one role type possible. The available roles for an event type must be defined for each type.

There is also a need to have dates for each place, e.g. arrival and departure day. And also sources? The order in which the places occur is obviously important.

It is also important to consider the implications for programs. All programs have only one place field per event, but they could have some sort of indicator that tells the user that there are more places - popups to see/edit all. Also the scripting language used to specify sentence templates must be enhanced. Relational databases might need an extra record type. How difficult will it be?

Since all? "person events” (but not all place events) with multiple places can be represented by single place events, although with more robot language in some cases, the question is if there is enough support for multiple places. I am sure this is an old Gedcom issue so there may be a lot of people with strong opinions.

Geir

theKiwi 2011-03-24T20:01:27-07:00

I'm not convinced that events need more than one place - to me an event is something that can be represented by a date and a place.

"Emigration" that takes 6 months can be broken down to be

Emigration from place1 on date 1
Immigration to place 2 on date2

what happened between date1 and date2 happened on a path, not a place (unless you want to somehow describe the 12,000 mile ocean voyage from England to New Zealand as a "place".

And the Emigration and Immigration will likely have 2 quite different sources - almost certainly from 2 different countries if it's a migration from one country to another.

Roger

AdrianB38 2011-03-25T05:19:51-07:00

Roger - I guess you're one of those to whom an event would only have a single date, whereas I'm one of those where an event could have a FROM-TO date range, e.g. "The First World War happened From 1914 To 1918" - OK not phrased in a genealogically relevant fashion, but it illustrates what most people think of as an Event. (Or at least - most English speakers would - I'm unsure whether the concept would translate).

(NB - this is NOT the discussion thread to debate this Event-From-To topic. Not sure if there is one in the Reqts Catalogue, but we've had such discussions elsewhere).

Anyway - at root, those of us who subscribe to the Event-lasting-more-than-one-day view are I suggest more likely to want a From and To location at the very least, as we would find it more natural. Certainly, you could describe your 6m emigration journey as you suggest. (By the way, I would describe "at sea between England and Australia" as a place - I've got someone allegedly born on the journey).

For me though, I'd like to describe "emigration" as an event from date1 to date2, from-location place1 to-location place2, and I'd quite happily have 2 (or more) sources for the journey, simply with a note against the "citation" that says this source is for the origin and this for the destination.

To me, this construct is far more natural, and comes out in reports far better - I did fight against using the 2nd (custom) location for emigration in my software but it just read better all in one. Hence my proposal for multiple locations for an event.

(I also have a prejudice against the "immigration" event as defined - what the heck does it mean? No-one in English-English uses the verb "to immigrate" and when I pass through "Immigration" at an air-port it's not because I'm going to live there permanently, so there's a danger the event will get misunderstood in future. However, that's not strictly relevant.)

AdrianB38 2011-03-25T05:35:37-07:00

Geir,
My personal view is that I don't see the need for "via" locations. Or more accurately, the benefit that they bring seems to be outweighed by the extra complexity of a possibly endless array of via-locations. (I can live with one via-location). But I've written it into the Requirements to have this discussion.

Location1 "or" Location2. Interesting. However, this could start a list - if or-places, why not or-dates? I think I'd prefer if we stuck to the convention that alternatives are done by putting in a 2nd event. (Should this be explicitly recorded in the Requirements Catalogue)

Dates for each place? Again, interesting but I think if we confine ourselves to just 2 locations per event then the necessity for dates for each place goes away as it's just the from-date and to-date that apply respectively to the from-place and to-place.

As for the programs, well, the software that I use does have space for 2 locations for emigration as it has a custom defined extra location. This is just a minor tweak to the form to show the extra location and would be an extra column in the database. An array of many locations (e.g. "from", "to" and multiple "via") would indeed be a different ball-game.

AdrianB38 2011-03-26T09:18:38-07:00

Having thought slightly more deeply about the data model for the Event and how it could be implement in GEDCOM, an RDBMS, XML, whatever, I now think my distaste above for an "array" of many locations (e.g. "from", "to" and multiple "via") was mistaken and that the extra complexity falls out of (a) the requirement to keep details about location history (specifically splits and joins) and (b) also out of an Event's need to record multiple persons. Once we have (a) and (b) designed in, then it becomes trivial to allow many more than 2 locations per event in data storage terms.

To make this clearer (I hope) - we would envisage event XML that looks something like this:
<Event id="EV1234" Type="some-type">
<Date>...</Date>
<People>
<Person id="IND4472">...</Person>
<Person id="IND4498">...</Person>
<Person id="IND41212">...</Person>
</People>
</Event>

Apologies for any mistakes and naiveties in XML. Note I haven't put anything in for location. Value of "id" is meant to be a cross-reference. The initial thought (1 location per event) gives me this:

<Event id="EV1234" Type="some-type">
<Date>...</Date>
<Location id="L4084"> ... </Location>
<People>
<Person id="IND4472">...</Person>
<Person id="IND4498">...</Person>
<Person id="IND41212">...</Person>
</People>
</Event>

We need multiple locations to cover the history of a location. It is possible to imagine a location being split into 3 or more at 1 event (e.g. dissolution of the USSR?), giving us:
<Event id="EV1234" Type="location-split">
<Date>...</Date>
<LocationList>
<Location id="L4030"> ... </Location>
<Location id="L4080"> ... </Location>
<Location id="L4084"> ... </Location>
</LocationList>
</Event>

Seems easy enough to me... And the <LocationList> element(?) can easily be used for the ordinary event to give multiple locations:
<Event id="EV1234" Type="journey">
<Date>...</Date>
<LocationList>
<Location id="L4030" Role="from"> ... </Location>
<Location id="L4080" Role="to"> ... </Location>
<Location id="L4084" Role="via"> ... </Location>
</LocationList>
<People>
<Person id="IND4472">...</Person>
<Person id="IND4498">...</Person>
<Person id="IND41212">...</Person>
</People>
</Event>

So - in this XML illustration, a list of locations is (a) necessary to record a location-split event, (b) easy because it's just like a list of people and (c) can therefore be used for any event.

gthorud 2011-03-27T16:21:29-07:00

I have always found the language produced by genealogy programs boring and unnatural. There are sentence after sentence starting with “Peter was …”, “He was….”, “He emigrated …” and so on. If I find it boring, I expect other readers to find it even more boring since they don’t know why. One reason for this situation is the simplicity of the current event structure. It seems that data structures are more important than the end product of our work.

It might be useful to look at some real world examples.

Considering that a lot of people emigrated from a port outside their country, and even transited through England (there was a small transit industry there) i think multiple places would make the language more natural. I would rather see “She emigrated to America from Oslo 12 may 1900 via England.” than “She emigrated from Oslo 12 May 1900. She went via England. She immigrated in America.” The single sentence could often be followed by “She immigrated at Ellis Island 29 May 1900 where it was recoded that she was destined for Coon Valley.” rather than two more sentences.

Emigration/Immigration is one thing, but there are also a lot of records of migration within a country. Currently there is really no way to say that someone moved from a to b.

Also, there are things related to property, for example “Hans bought a part of the Olson farm in 1875 and called it My Farm.” Sentences that rename a farm would be impossible without references to two farm names. And what about inheritance (probates) back in the 1600s when people owned bits and peaces of several farms, and it was often the case that someone inherited pieces in two or three farms. Or the simple fact that someone owned several properties, you don’t want a sentence for each of those.

So multiple places is an opportunity to get rid of some robot language. The limiting factor in this game is not the user interface or databases, but the construction of sentence templates since it must be approximately the same in both the exporting and importing application. The key will be the selection of appropriate roles for places, and the fact that information in sources is often “standardized”.

I have already mentioned events for places above, some of them can not be described without reference to more than one place, and I am sure there are cases where you can get rid of some robot language in such events also.

Regarding the maximum number or places, I think that 3 (maybe 4) places should cover most cases. But I think the limit will have to be investigated later.

I am not sure I understand why the case with the ambiguous places name must result in two different events. That would just mean more work and more robot language.

Also, there are currently many date and place pairs in user interfaces, so I don’t see why dates could not be grouped with places in a structure, you could just define a structure with repeatable pairs where both the date and place are optional.

AdrianB38 2011-03-28T07:49:51-07:00

The robotic nature of computer generated reports will probably never be defeated without transforming the method of input and data storage totally. Such a change is probably beyond the scope of creating BG as a successor to GEDCOM in today's programs.

However, we can surely help the applications along by making available a data structure closer to a sensible sentence. And Geir shows this well with the examples on Emigration etc and Moving. Nobody (at least in English) ever separates the start of the event from the end when creating a sentence. The events amenable to this sort of treatment can be recognised, I suggest, by the fact that they always come in pairs. Nobody ever emigrates without also arriving in the target country (what GEDCOM refers to as the Immigration event). So why separate the events in the database? You may not know the details at the "other" end, but any intelligent application can code round that when creating the narrative sentence.

So there's the challenge - what pairs (or trios?) exist?

theKiwi 2011-03-28T07:53:28-07:00

I'm still thinking about this in general, but one point in response to

Nobody ever emigrates without also arriving in the target country (what GEDCOM refers to as the Immigration event).

1 - People who emigrate and die at sea
2 - people born at sea during the voyage have never really emigrated from anywhere?

In both cases these are clearly 2 quite separate events

1- an emigration and a death
2 - a birth and then an immigration

AdrianB38 2011-03-28T11:54:58-07:00

"these are clearly 2 quite separate events"

That's true and a good reason to continue to allow separate events - but in that case one could simply omit the "missing" bit. For 99.9% of people then, the 2 events are described verbally as one, so would be more useful entered as one. If you wanted to.

(Of course, we could argue pointlessly for days about whether (2) really is an immigration or an arrival!)

AdrianB38 2011-03-25T08:20:31-07:00

Data-Event04 Events over a time-period

This has been raised to act as a home for any further discussions about whether or not events should be able to last over more than 1 day.

Note earlier discussions on this topic in discussion "Syntax09 Define Event vs. Attribute"

AdrianB38 2011-11-26T12:01:29-08:00

"a good example yet that required more than one location for an event that doesn't have a duration"

What about some natural disaster? I know from letters loosely connected to my family that the San Francisco Earthquake of 1906 affected both SF and Oakland (and no doubt many other places as well...). As the family were in both places, I'd like to at least consider the possibility of associating them with the 'quake and describing it as occurring in at least those 2 spots seems desirable. (Plus that well known place "et cetera"?)

AdrianB38 2011-11-26T12:14:51-08:00

"I represent the two end-points as independent events ... but link one to the other. That effectively defines a "protracted event" that has a duration"

We had an earlier discussion about "What IS an event?" and my favourite definition at the end was something like "It's a change of state compared to what went before." It's then quite possible to imagine something like WW1 as one event lasting from 1914 to 1918 (a temporary change from the state of peace to one of war) OR as one event in 1914 to cover the outbreak (a change from the state of peace to one of war), a different state after that that is NOT explicitly referenced (well, I wouldn't reference it) and then a second event to cover the end in 1918 (a change from the state of war to one of peace). Again, I'd not thought of linking the two but it does make sense.

As for whether the best plan is one long event or two end events, I think I'd like to reserve my decision on a case-by-case basis and see how the other data tends to play out in a particular application. Mathematically, the two approaches seem equivalent at first glance.

"It also allows the two end-points to have independent locations as in the case of a long journey." That's true but I'd be slightly concerned you might be putting the cart before the horse there - if you have a need for multiple locations, then why not satisfy it, rather than succumb to a restriction from the start?

ACProctor 2011-11-26T12:36:37-08:00

Re: San Francisco earth-quake having multiple locations...

My format solves this through inheritance. A generic event represents the overall quake and derivations of it can specify a more precise location. Other information such as the actual date are inherited from the generic event.

I use the same mechanism for census events. A generic one represents the complete census on a particular night in the UK while derivations specify a particular household etc.

ttwetmore 2011-11-26T23:45:49-08:00

"That sort of flexibility sounds like it would make subsequent processing more difficult. Having a simpler 'event' concept that is well-defined and atomic allows you build concepts such as a 'protracted event' or a 'structured event' without losing the simplicity of using it as a marker in time. I haven't come across a good example yet that required more than one location for an event that doesn't have a duration."

Well, the flexibility allows the simple approach you suggest, for which processing is trivial, so the DeadEnds model is fine. Plus, as a developer, I don't see any difficulty in processing more complex events. The only deficiency in the DeadEnds approach vis a vis what we are talking about here, is the ability to link events. That's got me thinking. But since this has never come up before in any discussion of events I've had for the past twenty years, it deserves a little thinking. I don't see any real need to connect a divorce to the marriage it ends, but I can see why one might think it's a good idea. If an event has a start and a beginning, instead of two linked events, I don't see any real problem for a single event with a from ... to kind of date.

ttwetmore 2011-11-27T00:06:40-08:00

"My format solves this through inheritance. A generic event represents the overall quake and derivations of it can specify a more precise location. Other information such as the actual date are inherited from the generic event."

This sounds useful for a historical discipline driven by the need to analyze complex trends, large events, and so forth. One wonders how useful this is for genealogical applications, where the most complex events are usually those described on certificates.

"I use the same mechanism for census events. A generic one represents the complete census on a particular night in the UK while derivations specify a particular household etc."

In my approach each "household" is an event, and anything higher up in the evidentiary chain is a source. In the genealogical sense I don't see any value in an event which is "all census data enumerated on such and such a day," but I do see "all census data collected in such and such a parish" as a possible source, that would be a sub-source of, say, the 1871 Isle of Man census, though, for genealogical purposes, I simply use the entire 1871 Isle of Man census as a single source, with each household I extract from the census as a new event. Of course there is a lot of differences in how people approach describing the source of census data. I guess one could consider the entire 1871 Isle of Man census as one very large event, from a historical perspective, but I prefer thinking of it as a source of evidence, not as an event.

At the genealogical level, what is important about an event is that it name persons, give some important attribute/s of the persons (e.g., a name, a vital date), and if there are more than one person mentioned in the event's evidence, then any information possible about either the roles the persons play with respective to the event, or the relationships the persons have with respect to each other.

But this does lead to some interesting questions. Say you had an ancestor who was living in San Francisco at the time of the great earthquake. How would you like that information to be handled by your genealogical data model? Well, first you do need the evidence that your ancestor was living there at the right date, so that will be handled by some normal "event" like a city directory entry, or a census, or a letter. But then you want to say, by the way, this was the date and the place of the San Francisco earth quake. This has nothing to do with your ancestor, really, he was just living there then. Some genealogical programs come with historical databases that can just tie these facts to your ancestors automatically after examining their details.

I guess the question is whether a genealogical data model needs to include large historical events as a new data class, and, if not, how such information as "my grandfather Charles fought in World War II" should be encoded in the model.

ACProctor 2011-11-27T02:31:34-08:00

I've strongly distinguished genealogy from family-history in my model (which I really must finish off and offer up for people to comment on). Hence, the historical applicability was part of the design to accommodate a more general class of data.

Narrative plays a large part in my model and the narrative content can embed arbitrary references to top-level entities like Person, Place, or Event.

I wanted to get a good balance between a strong, flexible, and normalised approach to those top-level entities while allowing for completely ad hoc and free-form connections that you may want to record.

My big hold-up at the moment is whether Events group Persons, or Persons link to Events. It sounds trivial on the face of it and I can see arguments that work well in both directions. That usually means some middle-ground is where the best answer is :-)

(thanks for all this feedback folks. I'm getting more constructive comments here than I ever got on sgc)

ttwetmore 2011-11-27T02:40:53-08:00

"My big hold-up at the moment is whether Events group Persons, or Persons link to Events. It sounds trivial on the face of it and I can see arguments that work well in both directions. That usually means some middle-ground is where the best answer is :-)"

This is very interesting because I am also in a quandary about this. In the DeadEnds model I have the events refer to the persons via role references, and don't have the persons refer to their events. Of course this is at the data model level and therefore possibly the database level. Once the subsets of persons and events that the user is currently interested in are loaded into the computer's memory, most implementations would just add the redundant link for efficiency in processing. But this begs the question of what happens when a user want to load up a bunch of persons with their events, based on knowing the persons only. If the database were a relational one, there would be an event-person table that could be queried. That is the relational database table has "normalized" the problem away. But in a network database, where nothing is normalized, there is a problem. Even though adding a little redundancy I think it's good to have the persons refer to their events also. They can do this simply by just storing the events' ids in a list.

Note that in the GEDCOM model the pointers go both ways so that persons can get to their "events" directly (which in GEDCOM simply means their families) and vice versa. All other, non-family, events are simply bound into the bodes of the persons under the vastly simplifying assumption that all events have one roles only.

ACProctor 2011-11-27T03:18:46-08:00

I don't like the GEDCOM way of doing it since it is confusing a storage model with a run-time model. As you say yourself, when data is loaded-up then extra links can be added for efficiency. In fact, I believe a source-format (which includes usage for backups, data exchange, etc), a run-time data model, and a database model are all different and have different requirements.

I'm currently focused on a generalised source-format which I believe should be normalised (i.e. minimal redundancy and duplication.

A run-time data model would be a natural successor project but the requirements of it would be different, e.g. efficiency of lookups or correlation.

As for indexed database storage, I believe that's a decision for the designer of any commercial software product. Whether it uses a relational database, object-orientated one, OLAP one, key-value pairs one, or some proprietary one is their choice. [I'm pretty sure I've seen the same sentiments in another of your threads in these pages so that's very reassuring]

ACProctor 2011-11-27T03:25:32-08:00

In the interests of keeping this thread focused (I apologise for diverting it already), is there a separate one on the relationship between Persons and Events?

ttwetmore 2011-11-27T10:09:12-08:00

"I believe a source-format (which includes usage for backups, data exchange, etc), a run-time data model, and a database model are all different and have different requirements."

I agree. First there is the model. The source format (which I call the external format) is a text-based archival format that must be deterministic so it can be parsed, and I think it is desireable that it also be human readable. Then the database format is wholly up to the development team. And of course leaving it up to the development team often introduces problems (artificial limitations, non-standard extensions, misinterpretations of the model), but with published test data and requirements to pass a reflexive import to export test that leave data unchanged in order to be certified compliant, these issues are controllable.

"A run-time data model would be a natural successor project but the requirements of it would be different, e.g. efficiency of lookups or correlation."

Shouldn't the run-time data model be left to the development organizations? I may not understand you point here. By the run-time data model I think of the actual Java or C++ (in my case Objective-C) classes used to implement the software.

"As for indexed database storage, I believe that's a decision for the designer of any commercial software product. Whether it uses a relational database, object-orientated one, OLAP one, key-value pairs one, or some proprietary one is their choice. [I'm pretty sure I've seen the same sentiments in another of your threads in these pages so that's very reassuring]"

I agree. I like to play devil's advocate against relational implementations, since I think it is too ingrained in our default way of thinking, and I think it contributes to many of the artificial limitations found in commercial software, but frankly those limitations are not inherent in the relational model, but in the implementations. I prefer a network database approach because it feels no nice and object-oriented to me, but I am in the minority here, and am not trying to change minds, just trying to suggest the value of thinking before doing.

ACProctor 2011-11-27T10:19:04-08:00

By run-time data model, I wasn't so much thinking of the internals of a particular product as the public object/method interface that would be offered up for run-time interoperability.

This is something I feel would be a huge step forwards but fear it may be still far away - the possibility of products interoperating in real-time.

It would allow a specialising of the market (ideally) so that database products are separate from analysis products and separate from reporting products. It would also be necessary for any type of cloud computing where online trees could be published as opposed to simple pedigrees on a Web page (the latter doesn't support any type of analysis or correlation with, say, your own tree)

ttwetmore 2011-11-27T12:15:26-08:00

"By run-time data model, I wasn't so much thinking of the internals of a particular product as the public object/method interface that would be offered up for run-time interoperability."

I understand. I would call that a service API. New Family Search has one that they publicize, Ancestry.com has one that they keep secret. It may be a pipe dream to expect more than one service provider to agree on the same API, but we could hope for it.

eleanordew 2011-11-22T13:10:10-08:00

Slavery would certainly qualify as an event over a time-period, in fact, it could be seen as a state of being.

ACProctor 2011-11-26T06:56:38-08:00

I only came upon these pages when I saw a thread about "multiple places per event". This also connects with "events over a time period" here since emigration/immigration was cited as a possible example - the place of origin and the place of destination being at opposite ends of a protracted event. Someone pointed out in the other thread that this case could also be handled as two independent events but a less contentious case might be WWI which still has both a start date and an end date but which would mostly be kept together.

I happened to be working independently on designing a "source format" for family history which I intended to publish on the Web soon but I'd like to share my thoughts and ideas.

In that source format, I represent the two end-points as independent events (e.g. emigration versus immigration, or outbreak of WWI versus Armistice Day) but link one to the other. That effectively defines a "protracted event" that has a duration. It also allows the two end-points to have independent locations as in the case of a long journey. Note that it still allows the individual end-points to be referenced separately if necessary. For instance, if something in a person's family history was relevant specifically to the outbreak of WWI or the ceasing of hostilities then they can still be referenced explicitly.

I've generalised this approach so that multiple mid-point events can be associated with the overall group, thus defining a "structured event".

Do you think there might be some useful ideas in this approach?

NeilJohnParker 2011-11-26T07:30:59-08:00

I believe there is a similiar situation with other events, e.g. marriage and divorce. The divorce needs a separate event with its own date and possible place if its relevant but most importantly it needs to be linked to a specific marriage although usually it can be inferred. Also both the marriage and divorce event may need to contain an attribute of the authority that granted the marriage or divorce, i.e Smith Fall's Presbeterian Church or Family Court, Witchta, Kansas respectively.

ACProctor 2011-11-26T07:46:11-08:00

The idea of extending Event-groups that far is a big step. It's true that divorce could be treated as another end-point to the overall marriage but it feels more hazy, or maybe I'm just more scared of going that far ;-) Courtship usually pre-dates the marriage but is often left out in our the family history of our culture. The signing of different forms of marriage agreement could be before or after the marriage. My own marriage was in two parts (civil and religious) which happened over 5000 miles apart. What about separations, both formal and informal, and reunions for that matter? That approach could be taken to include someone's whole life from birth to death.

I think I didn't go that route because I'd defined an event as simply something connecting one-or-more persons with a place at a given time. I'm not trying to deconstruct a whole marriage or a whole life and represent it all as a 'structured event'.

I'd be interested to see how other feel about that.

NeilJohnParker 2011-11-26T09:04:50-08:00

Unfortunately a divorce is associated with one specific marriage, although which one can be implicityly inferred (if and only if you have the dates of each marriage and each divorce), would it be better to explicitly state the link between the two events (if one knows what it is), especially when the dates are unknown or uncertain.

ttwetmore 2011-11-26T09:17:46-08:00

In the DeadEnds data model the event is allowed to have any number of dates, and those individual dates can be date ranges themselves, so theoretically you can have an event for something that occurred just on a series of weekends and you could handle that. Not, of course, that such a feature would ever be widely (or ever?) used. The point is, that it is so easy to allow the flexibility that one just does. Likewise, the DeadEnds event can occur over any number of places, though there is no notion of the starting place and the finishing place. DeadEnds does not have an obvious way of linking events to one another, however. I've never imagined the need for such a thing, though the marriage followed by divorce examples does cause one to think about it.

ACProctor 2011-11-26T10:18:15-08:00

That sort of flexibility sounds like it would make subsequent processing more difficult. Having a simpler 'event' concept that is well-defined and atomic allows you build concepts such as a 'protracted event' or a 'structured event' without losing the simplicity of using it as a marker in time. I haven't come across a good example yet that required more than one location for an event that doesn't have a duration.

AdrianB38 2011-03-28T08:31:24-07:00

Data-Fam02 - Cohabitants

"BetterGEDCOM must support the recording of information about cohabitants, with or without, common children. Cohabitants should be treated in the same way as married couples, and there should be events for the establishment and dissolution of "cohabitants". Some couples may start out as cohabitants and then marry."

Why: "The percentage of couples that are cohabitants is increasing in the western world, in some countries it is as high as 25-30%. BetterGEDCOM should not discriminate people in such relations."

GeneJ 2011-04-05T09:21:06-07:00

@Adrian,

Cool.

PS. "it's up to the application coders to come up with the desired reports."

BetterGEDCOM needs to distinguish between new or custom tags the create essential "genealogical" associations (whether traditional, scientific or other) and all other tags, right?

AdrianB38 2011-04-05T13:14:07-07:00

"BetterGEDCOM needs to distinguish..."

Yes. I think. I also think that USER defined tags will not be able to create any "genealogical" associations for the simple reason that the software will not understand them. UNLESS they can inherit properties of "higher" tags - which I think is a requirement.

Custom tags defined by an application's developers will be able to create "genealogical" associations in that application but will not be understood outside that app.

GeneJ 2011-04-05T14:56:15-07:00

Cool.

P.S. My earlier comment about "reports" was just an attempt at clarifying that "genealogical association" concept.

GeneJ 2011-04-05T17:04:30-07:00

Wait ...

You wrote, "'genealogical' associations .. will not be understood outside that app."

Where do I enter a requirement about "genealogical associations?"

Or otherwise, what am I missing.
Don't we break content if generic application data for BMDB and information the role of "children" (in its various forms) is not able to be understood from program to program.

gthorud 2011-04-06T05:48:59-07:00

I have to cover several issues since a lot has happened since my last post.

First, one issue from my first posting above. I think there is a need for a DEFAULT “unknown” status. I could call it marriage status, but it could as well be defined as cohabitation status – i.e. you know they had children, but don’t know if they ever lived together or were married. The point is that when there is no marriage event (of any type) and no cohabitation event – no nothing – the status should not be assumed to be “not married” or “not cohabitating”. This is just something that needs to be stated somewhere, and there is no need for data reflecting this. But there could in addition to this be a need to explicitly record that e.g. “It is not known if X and Y were married.” Is there such a need?

Although we have probably sorted out the official/legal status issue, it really should not matter. It should not be a requirement that all events in BG must have a legal status, or be recorded in a document.

I agree that the term “family” could be a very wide concept as Adrian has described. (Do we also include mafia families?). According to Webster’s, one definition of a family is a household. One definition also include servants and their family.

Marriage and “moving in as cohabitants” can be seen as two events (of many) that “initiates” (or changes the status of) a family, but what they really define is the initiation of a relation between 2 persons. Since there are several types of marriages etc, I think it would be useful with a super event type “initiating a family type of relation between two people”. Similarly there will be a “super divorce”. Since the term event has an established meaning, it might be better to use the term super rather than sub – and there is also “class”? .) I am in no position to choose the term to be used in English for the “Domestic union” or super family, but the definition should not be mixed with the definition of a super term for the relation between the two persons –there is a need for two super terms.

An event such as marriage would have the effect of dissolving the cohabitant relation, but there is no need to state that anywhere in the data or in a report. Adoption is, at least in my head, a family initiating event, but it says nothing about the relation between the parents (or does it? Varies from country to country?). There are cases where persons of different religions have two ceremonies performed, one for each religion, but that should not be a problem. (I don’t see a need for 3 levels in the super/sub hierarchy, i.e. subtypes of marriage
There are several types of “baptism” that would be sub types of a “super baptism” event type, but are there other types of events where we need super/sub?? The purpose is to define a common way that programs should treat these sub-types, and would also be useful for the understanding of a foreign unknown event type, but the implication for programs is most important. (A possibility is perhaps “life story”, but that would only? control the placement of info in reports. See below.)

We may need a special event type for cohabitants living in a college dormitory, but I will leave to others to decide if that would be useful – i.e. would it ever be used. It might be necessary to include some qualifying text in the event sentence for the family initiating/dissolving “cohabitation” event in the US and some other countries – or use a different term?

I think it is important to create an example of how an extended family would appear in reports, if you don’t, it will not be implemented. Vendors must understand what we are talking about. If we have invented a “life story” type of report, it will have to differ from the normal biological reports in other ways than just extended families – extended families are not important enough on it’s own to be a reason to have separate type of report – but I would rather see this info in a “normal” report. Some narrative biological reports have, by default, paragraphs for persons only (with lists of children), others have paragraphs for the “parents” followed by person paragraphs and then children. Would the “aunt” fit in a paragraph after the children? It may not fit in the paragraphs for the heads of family, but if there are events for the family group saying that they lived in X and Y, it could fit there. A variant, for a biography for the aunt would be to mention that she lived with the family of X and Y, rather than listing all members of the family. (Am I correct in assuming that in the US there are style guides for reports, where this type of info does not fit?) I don’t think you would duplicate information about biological families and social families in a report by listing all members of both families. I don’t want to be required to print different report types, I want it all in one – otherwise the social/extended family info will not be used. The aunt might fit in both a person and family life story paragraph, as a life story class of events (does any program have such a class?)– but I thought we were talking about representing families as Groups , how does that fit with using events for the same purpose (other than birth and marriage/cohab)? Further work on this is needed!!

GeneJ 2011-04-06T10:50:52-07:00

Hi Geir:

You wrote, "If we have invented a “life story” type of report"

We need terms that distinguish between tags (or tags and roles) that have key genealogical significance and those that do not.

I borrowed the term "life story" from the BCG Genealogical Standards Manual. In the simplest terms, it's the "those that do not" group--tags that don't have that key genealogical significance.

Those applications that support Register or Quarterly styled reports, segregate key genealogical data into a paragraph called the "genealogical summary paragraph." The "life story" paragraph or paragraphs follow. These biographies close with a "list of children."

Here's an example of a generically named by stylized narrative:

http://www.genbox.com/reports/webs/Descnarr103.htm

It's really the "key genealogical data" tags/roles that need to be identifiable.

If programs don't understand each others' "key genealogical tag" then content such as descendant charts and trees, as well as stylized narratives, will be broken.

GeneJ 2011-04-06T10:54:38-07:00

P.S.
See BCG Genealogical Standards Manual, p. 66 for definitions and descriptions of "Genealogical Summary," "Life Story" and "Child List." Numerous examples follow.

AdrianB38 2011-04-06T14:07:31-07:00

Am I missing something here? I just feel we seem to be making heavy weather of things with our calls for examples in reports, etc, and defining these tags that have...

Gene - re "distinguish between tags (or tags and roles) that have key genealogical significance". Yes - but can you define "key genealogical significance" first? If you do, then I suspect it will be obvious which tags (whatever) contribute...

I wouldn't try to generate any type / sub-type arrangements until you've got the detail - then it should be easier to do that.

And Geir, re "an example of how an extended family would appear in reports" - I really do not see it as BG's role to define the format of reports - that's totally the role of the application. And if the BCG GSM has a defined format that needs to be altered because we are altering the definition of family, then I suspect the BCG should do it.

As I said - am I missing something about the data and how it is to be stored?

GeneJ 2011-04-06T14:41:09-07:00

Hi Adrian:

Yes, I can!!

From a data standpoint, "how" you accomplish some of this depends on that family wrapper.

Independent of that wrapper, however, for any one person:

PARENTS (or "parents," as the case may be)[1]
BIRTH (or primary birth/best evidence of birth, as the case may be)[2]
DEATH (or primary death/best evidence of death, as the case may be)[3]
MARRIAGE (or "Union"; each, and each spouse has their own set of "key genealogical tags"; marriage/union dissolution)[4]
CHILDREN (or "children," as the case may be)[5]

[1] Parents might be adoptive, etc. BetterGEDCOM has discussions about this.
[2] Best evidence of birth, for example, might be baptism.
[3] Best evidence of death, for example, might be a burial record, and obituary publication date, or will date and probate date.
[4] Where marriage might be union, etc. Sometimes the best evidence of marriage or union is birth of a child. Best evidence of a marriage dissolution is remarriage of one spouse. I've observed a dissolution remarked parenthetically withing IN a marriage tag, such as XXX married XXX... (divorce) ....
[5] Where a child might be a natural child, and ?optionally, adopted children, the children of a spouses (and I presume the extended definitions we have been providing.) There are other discussions in BetterGEDCOM for these.

gthorud 2011-04-06T15:30:12-07:00

I have entered a new requirement into the catalog - Event Classes, Data-Event05. The first issue is if the term "class" is ok. My thinking is that events already has a defined meaning, so using sub-type does not fit with existing events. Also, Class is a computer term that fits exactly with what has been discussed. If there are no comments, I will use that term - and create a discussion topic.

AdrianB38 2011-04-07T08:39:32-07:00

Gene
My first thought is that all those events can be - indeed, should be - recorded as multi-person events and none should be recorded as "Family Events".

PARENTS - birth parents are identified by the birth event of the person concerned. The 2 (or 1 if the other is unknown) parents will be recorded as persons linked to the birth event - the 2 parents have roles of birth-mother and birth-father and the individual has the role of child. (Birth-child??)

To navigate from the child to their parents, go to the child's birth event and find the 2 people with the correct roles.

If you want to navigate to adoptive parents, go to the child's adoption event and find the 2 people with the correct roles of adoption-parent.

Possible issue 1 - how would we present a child adopted by their step-father (say)? The adoption event would obviously contain the step-father with a role of adoption-parent. Would the mother appear for a 2nd time? And if so, with what role? I suspect my answer would be - what does it say on the document?

Possible issue 2 - if you want to trace family history via adoptive parents, you need to create an adoption event, you can't just put the child into a family with its adoptive parents. Seems fair enough to me - I'm just asking people to distinguish between an adoption and a case of just living with someone for a while.

BIRTH - see above.

DEATH - an event with, I presume, just one participant, having a role of "deceased".

MARRIAGE - in my view of the world this (or any of its variants) is NOT to be recorded against the family. Instead, there is a multi-person event of "Marriage ceremony / Civil Partnership ceremony / Cohabitation start / whatever ", which describes the EVENT. The 2 spouses would be persons linked to the event, with a role each of "spouse" (alongside, say, 2 others linked with a role of "witness").

Dissolution should be obvious - another multi-person event linked to the 2 spouses, each with a role of spouse.

Children - someone's children are identified by looking for multi-person birth-events (or adoption-events) where that someone is a participant in the event with a role of birth-father, birth-mother, adoption parent, whatever. Then the children are the persons in the event with a role of birth-child / whatever.

Possible issue: Because you can't navigate biology for this sort of thing from the family, you must have a birth event for this person - you can't just add them into a family. This is not unreasonable - everyone is born, I suggest. If someone objects that they can't tell who the parents were, fine - leave those roles unfilled in the event. Clearly they won't appear as anyone's descendants, which is fine.

Children of a spouse (i.e. step-children) - unless these are adopted, they should appear only as descendants of their birth parents. They can appear in a family-history report about the social nature of the parents' lives, but they should not be appearing in a blood-line descent report. Time to get rigorous about the purpose of these reports...

So, I think for all these, the family group has NO influence on the bloodline / adoptive-line reports (up or down). But it DOES influence the social history reports of individuals.

gthorud 2011-06-19T17:32:11-07:00

Just a link to a related GRAMPS discussion

http://gramps-project.org/wiki/index.php?title=GEPS_001:_Relationship_type_event_link

(which has previously been pointed to by the Shortcommings of Gedcom page)

GeneJ 2011-04-04T13:12:20-07:00

If you aren't there ... you are verrrrry close.

GeneJ 2011-04-04T13:15:02-07:00

P.S. Thinking out loud again.

If a "union" exists that creates "heads of families," then from the standpoint of standardized biographies or narratives, the children of either/both would be relevant, right?

AdrianB38 2011-04-04T13:42:33-07:00

"the children of either/both would be relevant, right?" Yes - but I think that (a) it's down to the application writers and (b) there are a couple of ways they could go.

In my view - which may not be shared by everyone and may not match the final BG model - a family a.k.a. domestic union is a social construction. The (believed) bloodlines are separately derived from the birth events of the children because these document the (believed) parents of the child.

A standard biography of X might include details of the (social) families that X was a "parent" in, or a "child" in, or a "dependent adult" in (guessing at roles here). Any or all of those concepts could be supported by just looking at which family a.k.a. domestic union group the person is in in those roles.

One could - additionally and separately - describe said persons (believed) biological (or indeed adoptive) parents by navigating to the birth (or adoption) event for said person.

And once you'd done that, then you could repeat ad infinitum.

(Or your report could go backwards from the heads of the social family...)

AdrianB38 2011-04-04T13:49:38-07:00

Re my Domestic Union / whatever definition - is this not, on reflection, simply a family??

Or have we got too close to the conventional married ma and pa with that term?

E.g. "A FAMILY is an arrangement whereby two or more people decide to live together on a long-term or permanent basis in an emotionally and/or sexually intimate relationship which may be formally recognised or not. THE FAMILY MAY SUPPORT other ADULT OR CHILD dependants."

Does this cover the full range of possibilities from
co-habiting couples
to same-sex partnerships with adoptive children
to Egyptian Pharaohs with harems
to married ma and pa and the kids - with their aged maiden aunt???? Etc??

ttwetmore 2011-04-04T15:14:17-07:00

Adrian,

I think the answer is yes, yes, yes, yes, yes, and yes.

The "family hating" camp stresses that all kinds of weird relationships can show up in a household, and since genealogy is supposed to be biological, it's a useless concept. That is a very strict and short-sighted definition of genealogy.

The "family loving" camp stresses that genealogy can cover many aspects of family history, and how people lived together matters.

Personally I'm in the "family loving" camp as long as we can extend it to cover all the cases you're worrying about.

Tom W.

gthorud 2011-04-04T18:07:15-07:00

It seems like Cohabitants is a publicly recogniced status in the UK.

See
http://www.statistics.gov.uk/hub/population/families/marriages--cohabitations--civil-partnerships-and-divorces/index.html

http://www.statistics.gov.uk/StatBase/Product.asp?vlnk=14491

Or have I misunderstood something?

Civil partnership seems to be a gay thing.

From one of the reports (for England and Wales):
The number of cohabiting couples is projected to rise from 2.3 million in 2008 to 3.8 million in 2033 (Table 2). The proportion of those cohabiting who have never previously married is projected to rise from 74 per cent to 87 per cent.
Ref: http://www.statistics.gov.uk/pdfdir/marr0610.pdf

The percentage of cohabitants today seems to be above 20% of the number of married people, about 1 of 6 couples. I am surprised that this has left no traces in laws or public regulations.

gthorud 2011-04-04T18:48:23-07:00

I think it would be useful to create a concrete example of how an "extended family" with the old aunt could be recorded in a data structure, for example using the group concept, and showing how this could be shown as an extension to how families (parents with childrens) are presented in reports now. Any supporting events could also be shown in the data.

Maybe that should go in the discussion of Data-Family01.

gthorud 2011-04-04T19:25:41-07:00

Continuing from my last posting.

Considdering that the relation between the father, mother and children is recorded by eg. marriage and birth events, in what structure do we place the old aunt without repeating the relation information carried by the events? How would a program know what to put in a report (based on which data structures, what triggers the output of the aunt), and where in the report, and how would the whole extended family look like?

GeneJ 2011-04-05T08:00:21-07:00

Genealogically speaking, I see a difference between "heads of families" and the ways you might define the children (ala, the "genealogy" linking) and individuals or "family associations"
(like Aunt Nellie)--those who influence the lives of the family. The latter has a life story feel to it.
In that context, Auntie is a "life story" assertion (with a separately defined "genealogical" relationship). Ala, an Aunt Nellie who lived with the family for 15 years may well be associated and assigned roles relative to many events.

I can see not just Aunt Nellie, but perhaps close family friends similarly associated with assigned roles.

GeneJ 2011-04-05T08:24:54-07:00

P.S. I favor reserving "Cohabitant" for the "heads of families" concept. I'd prefer to see a different words or means by which otherwise unassociated children are linked to the family unit.

That separate "Aunt Nellie" role has a "kinship theory" feel to it--I'd prefer to see those concepts in life story tags.

Separate from references to her in the other family biographies, wouldn't we want Aunt Nellie to have her own biography and story?

My thoughts only -- possible my perspective falls short of Adrian's concept.

AdrianB38 2011-04-05T08:43:04-07:00

"Cohabitant is a publicly recognised status in the UK" - oh, it's a status alright, in the sense that it exists at a statistical level, but it's not something that can be formally entered into. Most legislation recognises its existence - for instance, paternity leave is not dependent on marriage, but there are all sorts of question that could be raised about when cohabitation starts... But, like I said - I'm a mathematician who likes stuff to be ordered!!

AdrianB38 2011-04-05T09:02:16-07:00

I wouldn't want to get too illustrative about reports because I think different people want different things. There's a "social history" angle to them and there's a "(presumed) biology" angle to them, and old-fashioned genealogists (none here surely!) might deride the social history ones.

The point is that if we can show the concepts cover everything somewhere, then it's up to the application coders to come up with the desired reports.

I think Geir is right - it's Fam01 ("Families independent of biological relations") that should have the illustrations in... I shall try to concoct some over the next couple of days.

I shall try also to illustrate there how I think stuff need not be repeated. I hope...

gthorud 2011-03-28T17:07:01-07:00

I have to consider this in the context of how things are done here. 30-40 or more years ago the only way to describe cohabitants was that they were not married, and being not married was something out of order, so stating that someone was not married was a negative statement. Since then the term cohabitant has been accepted (halve of the children born in Sweden has unmarried parents, halve of all Norwegians have been a cohabitant, 30% of new families in Norway are cohabitants and does not marry, the percentage in the rest of Europe is above 10%), but in some circles “being unmarried” is considered a negative thing, and the term is used to discriminate people. So, although I could use the “status” unmarried/never married about someone before say 1950, I would not use that term about people living today.

Cohabitation has in the last 10 years gained more and more acceptance as a legal “institute” here and is in many (most?) situations considered to be the same as marriage. About 20% of cohabitants sign a contract that regulates what should happen in various situations, if the relation breaks up or one of them die.

So to me the solution is simple, consider cohabitation equivalent to marriage, with events for “moving in together” or establishment of a cohabitant relation and “moving out” or dissolution of the relation – similar to marriage and divorce. You can have dates and refer to sources (e.g. a contract). And, these events are events of real life, formal or not formal does not matter.

There are many types of marriage, e.g. Common law marriage, Gay marriage, and whatever in various religions. You could define these and Cohabitation as user defined events, but the problem is that programs have to treat these events specially, so they should be defined in a standard in the same way as marriage.

TMG has groups of events, one being Marriage. If you create a Cohabitation event type, and assign it to that group, TMG will treat it in the same way as marriage in reports etc.

There is one situation, which I assume is common in many countries, and that is when cohabitants marry. That must be handled without creating a new family – but this may not be a problem in some programs.

(A special problem arises when cohabitation is as common as 30%, you often do not know if people living today are married or are cohabitants, because marriage records are not public. But I guess most programs are able to handle unmarried parents, so there may be no need to define a “married (or cohabitant)” event.)

AdrianB38 2011-04-03T13:29:25-07:00

Geir,
I do prefer the tactic of defining something by what it is, rather than what it's not, i.e. "cohabiting" rather than "unmarried". We might want to review the English words - in the UK we'd simply say "living together" and the actual word "cohabiting" would seem a bit of a mouthful. That's a detail, however.

But I also think we want to firm up on what the "cohabiting / whatever" term means. If there's a legal basis to the partnership then I'd exclude that from the "cohabiting / whatever" term as the suggestion there is, to me at least, that there is no legal basis. So we've probably got several varieties of thing to consider.

In order to cope with unmarried / uncontracted / cohabiting / whatever couples changing their status, it would probably make sense to have events for the creation and dissolution of such relationships.

Except I'm still wholly dubious about a couple where there is no formal legal basis for the partnership being recorded with anything other than a common "residence" event if there are no children to prompt a family's creation. If we start creating a family to record 2 co-habiting, unmarried, un-contracted adults living together, then we just recreate all the anomalies of the GEDCOM family with no justification outside the recording of an event that could be recorded more simply elsewhere (e.g. "residence" with 2 people).

gthorud 2011-04-03T15:34:07-07:00

Adrian,

It is difficult for me to discuss the English term, so I'll stay out of that discussion. (But I seem to smell a cultural difference since you are not using one word.) Rather than switching to the Norwegian term, I will continue to use cohabitant in the same way we have used PFACT.

Also, I am not too concerned with the exact term since that will have to be handled in translation. The important thing for me is that programs handle this similar to marriage, as I have described above.

Even if there is no official ceremony, and in many cases no contract, cohabitation? is a legal status here. For example, authorities dealing with social security and other benefits keep track of this, because single and cohabitants are entitled to different benefits.

I think you will run into trouble if you try to come up with a common term that has the same definition in all countries. And the legal status is different and is likely to change over time.

I don't see why cohabitants, with or without children, can not be considered a family. I don't see any reason to treat these differently from how BG will record families.

I just think we have to accept that there are cultural differences, and since I see no big problems in implementing what I want, I don't see why not.

I you want to record a residence event, that is ok for me.

AdrianB38 2011-04-04T02:13:45-07:00

"cohabitation? is a legal status here" - OK, that's an important point, fully justifying its appearance as an event in a file describing events in Norway etc. In the UK it simply doesn't have that legal status, though there have been high profile court actions when cohabiting film stars have split and one has claimed the equivalent of alimony - giving rise to the term "palimony".

"I think you will run into trouble if you try to come up with a common term that has the same definition in all countries" - yes, this is becoming clear. And I wholly subscribe to your view that "programs handle this similar to marriage". In Object Oriented terms, there needs to be some over-arching concept of "domestic partnership" (for want of a better term) that is something more than co-residence. This "domestic partnership" should trigger all (most of?) the special reporting and handling that we see with marriage. Marriage would then inherit the special reporting and possibly add some of its own. Civil partnerships in a formal legal sense would also inherit the special reporting and possibly add some of its own. Informal partnerships in whatever senses would also inherit the special reporting and possibly add some of their own. This seems analogous to the TMG handling of marriage group events that you mention above.

And somehow we have to allow the creation of these variations in each country. If we are too specific in the BG standard, we will exclude some of the variations. If we are too loose, we will be accused of being just as ambiguous as GEDCOM. So we probably have to create some specific things plus the ability to add inheriting user-defined variations.

GeneJ 2011-04-04T05:24:01-07:00

Are we able to identify Wikipedia, FS Wiki or other articles that provide an understanding to what we are describing by "Cohabitants."

I located two entries in Wikipedia.

For example, there is a Wikipedia entry for "Common-law marriage," purporting, "Cohabitation alone does not create a common-law marriage; the couple must hold themselves out to the world as spouse."
http://en.wikipedia.org/wiki/Common-law_marriage

There is also a Wikipedia entry for "Cohabitation," opening with, "Cohabitation is an arrangement whereby two people decide to live together on a longterm or permanent basis in an emotionally and/or sexually intimate relationship. The term is most frequently applied to couples who are not married."

The section, "Cohabitation by region," is pretty interesting.

AdrianB38 2011-04-04T07:58:47-07:00

The Wikipedia entry for "Common-law marriage" shows some of the complexities - I knew England and Scotland differed in their treatment of what can be loosely termed "Common-law marriage", but I hadn't realised one could identify 4 varieties in Scotland (assuming the article to be accurate).

(NB for anyone outside the UK - England and Scotland actually have a separate legal system and a law created in one is not necessarily applicable in the other. Indeed, I'm not sure if it ever can be...)

As I said, somehow we need to define an over-arching concept, with some more specific variations but space for further user-defined ones.

AdrianB38 2011-04-04T08:24:06-07:00

The thought of defining further ones leads me to pick up on previous words where I said effectively that I was dubious about creating a family record where there were no children and no legal status to the partnership.

Geir responded "I don't see why cohabitants, with or without children, can not be considered a family".

Thinking more about this, I am moving towards Geir's view. If the family represents a social structure (and I have previously suggested that a maiden aunt living with a family could be recorded in the family unit) then there is no reason why the social structure shouldn't consist of just two adults.

My desire to equate family with a social structure comes from a distaste for GEDCOM requiring a family when there is a (presumed) biological relationship between 2 people but no social unit. Two adults in a social relationship surely don't create the same anomalies, on reflection.

Somewhere in here we also need to distinguish between the social unit and the multi-person event known as a marriage. I think my mind can hear Louis telling me that the marriage _ceremony_ event is what creates the change of state between no-family and family. It is therefore not, now I think about it the same thing at all. But of course, we habitually (in English at least) refer to people being "in a marriage" when we actually mean "in a social unit that was created by a marriage-ceremony".

Thus we have
- the marriage-FAMILY describing a social group and inheriting the characteristics of a "domestic partnership" group
- the marriage-ceremony-EVENT probably inheriting the characteristics of a "domestic partnership" creation event and probably triggering the creation of a group though this is up to the application;
- the civil-marriage-ceremony-EVENT probably inheriting the characteristics of a marriage-ceremony-event;
- the church-marriage-ceremony-EVENT probably inheriting the characteristics of a marriage-ceremony-event;
- the civil-partnership-EVENT probably inheriting the characteristics of a "domestic partnership" creation event and probably triggering the creation of a group though this is up to the application;

And so on...

And perhaps unfortunately, this now requires 3 levels of event - bother. Not sure if that's an issue or not.

GeneJ 2011-04-04T08:30:40-07:00

Thinking out loud

I assume we are talking about identifying the heads of families in the context of genealogical biographies (regardless of whether either or both have children and not limited to traditional genealogy), then we need some way to identify "cohabitants" as something other than those living in my college dormitory.

While those roommates might be a part of my life story or even in group be associated with some role, they would not have the genealogical significance of those in "heads of families" roles.

GeneJ 2011-04-04T08:59:20-07:00

@Adrian,

"...dubious about creating a family record where there were no children and no legal status to the partnership."

Albeit a quite special one, Marriage seems a "class" of union, with roles "husband/wife" or "groom/bride" or "spouse/spouse," no doubt there are more roles.

I don't really see what children has to do with that. Either spouse or partner might bring children to a marriage, some children are the product of a marriage, some children are adopted by one, the other or both spouses.

GeneJ 2011-04-04T09:25:01-07:00

From the the two Wiki articles, how about the concept of "other union - by habit and repute."

AdrianB38 2011-04-04T12:36:20-07:00

I think your Wikipedia entry for "Cohabitation" is getting there for the over-arching union.

Maybe the concept I'm grasping towards is "DOMESTIC UNION(???) is an arrangement whereby two OR MORE people decide to live together on a long-term or permanent basis in an emotionally and/or sexually intimate relationship WHICH MAY BE FORMALLY RECOGNISED OR NOT, AND MAY INCLUDE OTHER DEPENDANTS."

(Change in CAPS).

This could include children and others such as elderly parents, maiden aunts, etc, in the household.

AdrianB38 2011-04-04T12:38:04-07:00

NB - I do NOT include the dependants in the "emotionally and/or sexually intimate relationship" - that simply provides the basis for the support mechanism for the dependants. Feel free to help with that phrasing!!!!

AdrianB38 2011-03-28T08:42:34-07:00

Can we establish how cohabiting couples would / could be recorded differently from a normal family in GEDCOM that simply omits the marriage event?

Is it simply that we need a "status" of unmarried? i.e. we need to confirm that the marriage has been omitted for a reason and not simply because it's not known?

Having said that.... While a simple status may suffice where the couple live together and _never_ marry, if the couple live together and _then_ marry after some years, the situation is more complex as I'm not sure how to describe the pre-marriage era.

The suggestion for events describing the establishment and dissolution of cohabitation, seems one way out of this but I'm not a fan of GEDCOM / BG events that don't match events in real-life - and the whole point of cohabitation is that it happens without a formal event as such.

A split of such couple does seem to match a real event. Two people starting to live together may not have any detectable start...

Maybe a dated-status is needed?

gthorud 2011-04-07T14:33:30-07:00

Evidence 01 - Evidence & Conclusion Model

Description: BetterGEDCOM could handle evidence and not just conclusions

Why:
Current GEDCOM is structured so that data about an individual or family is always the "latest working hypothesis". It is therefore difficult to identify the actual evidence, particularly when the "latest working hypothesis" is a composite of various bits of evidence.
Also, in the event of discovery of an error, it can be difficult to (a) identify subsequent issues and (b) revert to an acceptable set of "working hypothesis"
To overcome this, it appears as a minimum to be necessary to record evidence and conclusions separately.
See Evidence and Conclusion Process
Note this requirement is effectively the same as (possibly part) adopting the "Evidence and Conclusion Model", which is linked to, but not the same as, the "Evidence and Conclusion Process". See Glossary

Way forward:
It is far from clear to the author that a comprehensive set of genealogical processes exist to handle evidence and conclusions at a detailed, data element, level. In particular, it is far from clear how it is possible to "roll-back" to an acceptable state after discovery of an error.
Interesting processes do exist to derive genealogical conclusions from evidence, but these are quite different from analyses undertaken by most genealogists.

It is therefore suggested that handling of evidence data and not just conclusions, is postponed to a later release of BetterGEDCOM and the current work should simply not do anything that might make separate handling worse.

Se also previous discussion linket to in the Requirements Catalog.

gthorud 2011-04-07T14:37:35-07:00

On 7 March 2011 ttwetmore posted this:

This quote from the Evidence01 requirement has me VERY concerned:

"It is therefore suggested that handling of evidence data and not just conclusions, is postponed to a later release of BetterGEDCOM and the current work should simply not do anything that might make separate handling worse."

I read this as meaning that Better GEDCOM is chickening out on adding evidence, record-based support to its data model. It certainly means it's being postponed to the future, which to me means postponement to oblivion. My opinion has always been that adding support for record/evidence-based genealogy should be the most important goal of Better GEDCOM, a goal that cannot be postponed. If Better GEDCOM decides not to cross the chasm from person-based methodology to support for record/evidence-based methodology I believe it changes from a worthy enterprise to a near trivial tweak of GEDCOM.

Tom Wetmore

gthorud 2011-04-07T15:25:37-07:00

Well, I had writen a reply, but the system decided that I was not loged on, so I lost it all. Be aware ...

I'll nstart again.

AdrianB38 2011-04-07T15:33:14-07:00

Tom - I'm guilty of writing that caveat. I had my reasons...

1) Creating the data model for the real life side of things is easy. I imagine ditto for the current ESM citation style though I'm not wholly convinced that the multi reference stuff has been analysed yet (e.g. digitisation of a microfilm of an original). Creating the data model for evidence handling is not easy since in my head it needs more than just the creation of personas / evidence people / whatever.

2. Since there was a feeling that BG needed to get something out fast, the idea of phasing the model to produce the easy stuff first and the hard stuff later, seemed attractive.

3. I am far from convinced, as I said, that we have understood what evidence handling needs - my own idea of rolling back in case of an error - how do I support that? No, how do I DO that? Then there's the objective / research / input / output / conclusion stuff - all that stuff that you convinced me should go into the log - that is an integral part of evidence handling in my view and I simply don't see how it should be modelled yet. I just know it needs more entities than we've mentioned. (And more processes...)

4. If BG is to mean anything, we need to get the software developers on board. Again, getting them on board in 2 stages seemed more attractive, particularly if the initial steps are obvious and simple - hell, they're NOT simple - the multi-person event, groups, places, all those are going to non-trivial jobs. If this chasm exists (and I believe it does) then the developers won't even recognise any benefit to come from evidence handling and so will ignore BG if it comes as one indigestible lump.

5. One last thing though - this is a Wiki - it's trying to gain consensus - I put a starter proposal there but if the members think we should progress in a bigger leap, then let's agree it! (But I'm also convinced that I was NOT the first person to make this suggestion).

gthorud 2011-04-07T15:56:04-07:00

I am not sure if E&C is the most important issue in BG, and I am sure the rest is not trivial.

However, I am not aware of any decision to postpone this requirement so I suggest that the paragraph should be removed from the requirement.

I apologize that I have not followed this topic lately, but someone has to try to do some organizing. Also, I want to spend some time on sources and citations since I have started on that.

I am writing the following on thin ice, but a concern is any other solutions to the problem than what Tom is proposing. If FS comes with a standard (or whatever) proposal, we should be prepared for that situation. What is the difference between the Dead ends model and the data model in NFS. What is wrong with it and Gentech? Could a ?two-level? NFS model be a subset of a multilevel solution? What are the rules that would collapse a 2 or 2+ level model into a conclusion only model on import?

Also, how does the dead end model fit into a model for recording of the research work, citations and excerpts, and the evaluation of the information found in a source - that me result in many events in the E&C model. Do we need a description of a complete process (and the data recorded by it) that leads up to the evidence and conclusions in the E&C model?

My apologies if this has been sorted out already.

ttwetmore 2011-04-07T19:44:01-07:00

I apologize for my testiness.

Geir, Trying to answer your (excellent!) questions:

The NFS model it two tiered, with persona records and person records. Personas are great for holding "evidence or record-based" data, and persons are great for holding "conclusion or person-based" data. But these aren't rules that can be enforced in the NFS application. A persona records simply holds whatever a user of the NFS application chooses it to hold; scary thought. An NFS person is a grouping together of persona records. No justification has to made when adding or removing personas from a person group. The person record itself has no attributes of its own other than its global identifier. The users who put personas into the persons have the option of specifying what the overall person should look like when it is displayed. That is, users can say what the preferred birth event is, what the preferred name is, and so on. This can be changed by any other user. In the NFS application millions of the personas are just plain junk, so many persons are cluttered up with large numbers of junk personas. Some of the junk is really worse than you can imagine.

So an NFS persona record has all the available attributes that you would normally think of as being found in a generic person record. But on the otherhand an NFS person record is little more than a "bag" that holds persona records, with some added info that specifies what are the currently preferred facts.

In the DeadEnds model there is a single person record that does duty for both "evidence/record-based" persons and "conclusion/person-based" persons. The DeadEnds person records can be arranged into a tree of any number of tiers. So a person record can be BOTH a "bag of closer-to-evience persons" AND can have its own attributes, at the same time. Having its own attributes solves the problems that NFS has to use that user choosing approach for. So in the DeadEnds higher level persons, you can choose to add attributes if you need to resolve issues between the attributes in the lower level persons. If you don't need to resolve any attribute issues, the higher level person records will simply inherits their attributes from the lower level ones.

To collapse data from multi-level model into a conclusion-only model on import is actually quite simple. You create a single person record for each "tree" by bringing together into that person record all the birth events, all the names, all the other attributes of all the original personas "all the way down" recursively. As you do this you keep all the source references to the original source records so that every attribute inside the collapsed person record still refers to the proper source. The only problem is deciding which birth event or name among the many possible birth events or names that might be in the collapsed person record, should be given priority in the final flat person record, that is, the one display on screens, or to print in reports. This would be handled by conventions. Certainly if the higher level person has its own attributes, those would take priority, but if it doesn't I would suggest to simply use the order of inherent in the order of the lower level persons. Of course, if there are quality flags they could used also.

(In order to experiment with my software, I am actually doing the REVERSE PROCESS -- I am taking large, rich GEDCOM records from my LifeLines database, and by using the source references found within those records, I am BREAKING THEM APART into the persona records that I should have started with!!)

The DeadEnds model is conventional as regards sources and repositories. It has those two record types. All "evidence/person-based" records should refer to a source. For me a citation is nothing more than a formatted string that is generated by templates that use field values found in two places: 1) the references between the evidence-person records and the source records, which will state, for example, on which page of that source the evidence was taken from (or the URL of the page, or the on-line database); and 2) the source record itself where info like title, author, publication year, and so on are found.

I haven't added any records to the DeadEnds model for research logs and todo lists, etc. I was hoping to piggyback off some other model that has worried about that. I know that those things are very important, but I've never been very interested in them, so I hope I can simply take the ideas from some model (Better GEDCOM?) that does worry about them.

I don't like the pure GenTech model because of the extreme use of the assertion entity, and the fact that the GenTech model is a fully normalized model which makes it almost impossible to visualize. Being fully normalized is something that used to be needed in ancient times when databases were automatically assumed to be relational. This is no longer the cases. Normalized models completely obfuscate the simplicity of data models by adding table after table of difficult to grasp relationships.

Tom Wetmore

ttwetmore 2011-04-07T20:12:07-07:00

Responding to Adrian:

"Tom - I'm guilty of writing that caveat. I had my reasons..."

I apologize for any awkwardness I have caused you to feel by my intemperate comments.

"1) Creating the data model for the real life side of things is easy. I imagine ditto for the current ESM citation style though I'm not wholly convinced that the multi reference stuff has been analysed yet (e.g. digitisation of a microfilm of an original). Creating the data model for evidence handling is not easy since in my head it needs more than just the creation of personas / evidence people / whatever."

Ah, a very interesting statement. I believe creating the data model for evidence handling really is that easy! So easy in fact it doesn't even require a new record type! All it needs is extending the current person and event record types to be able to recursively refer to "lower level" persons and events. The fact that you think it's harder than this is something to explore. I think the way to do that is to imagine "use cases" that you would go through with a genealogical application in following some text book research processes, and analyze the data needs from a model to support them. When I go through those use cases in my head I always come up with my very simple extension. It would be great if others tried out that experiment.

"2. Since there was a feeling that BG needed to get something out fast, the idea of phasing the model to produce the easy stuff first and the hard stuff later, seemed attractive."

Understandable if the evidence extension is hard.

"3. I am far from convinced, as I said, that we have understood what evidence handling needs - my own idea of rolling back in case of an error - how do I support that? No, how do I DO that? Then there's the objective / research / input / output / conclusion stuff - all that stuff that you convinced me should go into the log - that is an integral part of evidence handling in my view and I simply don't see how it should be modelled yet. I just know it needs more entities than we've mentioned. (And more processes...)"

I agree with your concerns that there must be a proper intersection between the ideas I always talk about and the world of research logs and objectives, but I don't see fundamental problems. Say a research objective is an entity in our model. Our evidence records can simply refer to them, with the reference meaning "I am an evidence record that was researches, discovered and extracted because of that objective." Then our applications can provide us with lists of all the evidence that we discover while carrying out different objectives. Ditto for todo list items. If a todo list item is an entity in our database, it will refer to the objective record that the todo list is designed to help, and any evidence record discovered while carrying out that todo lists would refer to the todo list record. Then our application could show us our todo lists and what we have done so far in carrying them out. I think anything that is simple conceptually, as objectives and todo lists seem to be, should always be represented by things that are just as simple in a data model. I do see undoing conclusions as a problem with my model if we want to remember that we have made the conclusion so we can warn the "future us" not to do it again. I guess that frankly I am not worried about the problem of formally remembering my mistakes. I don't see much utility in it. Probably sounds like a cop out!

"4. If BG is to mean anything, we need to get the software developers on board. Again, getting them on board in 2 stages seemed more attractive, particularly if the initial steps are obvious and simple - hell, they're NOT simple - the multi-person event, groups, places, all those are going to non-trivial jobs. If this chasm exists (and I believe it does) then the developers won't even recognise any benefit to come from evidence handling and so will ignore BG if it comes as one indigestible lump."

Good practical issues. No disagreement from me. I just want it all, now, fast.

"5. One last thing though - this is a Wiki - it's trying to gain consensus - I put a starter proposal there but if the members think we should progress in a bigger leap, then let's agree it! (But I'm also convinced that I was NOT the first person to make this suggestion)."

Point taken.

Tom W.

AdrianB38 2011-04-08T03:47:38-07:00

Tom said "Ah, a very interesting statement." Good! I'm glad about that - reminds me rather of the time when I was a junior programmer and my programs failed with a dump. If I couldn't read the dump I'd take it to the technical guru of last resort. If he said, "Leave it there, I'll look at it later,", you came out knowing he'd never look at it. If he said, "Hmm, interesting..." you knew you were in with a fighting chance of some help...

His name was Tom too...(And he was a great guy)

AdrianB38 2011-04-08T04:06:35-07:00

More seriously - I'm quite happy to move my doubts and suggestion out of there and replace it with a more neutral comment about establishing scope and priority, now that we've got a discussion going. And there's a good case for saying that the Requirements Catalogue should cover it _all_, but any release of the Standard might be in phases to allow for priority / digestibility / whatever....

I think the point is we need to establish scope of just what this Evidence 01 requirement is. In my mind, when I was getting pessimistic / pragmatic about it, I saw the thing as a whole from setting of objectives through task definition / research log / input data / output conclusions / combined persons.

Clearly, modelling _all_ that is more than just recording that the person entity can be either an evidence or a conclusion person or both.

Perhaps it would be useful if I went through a sample case in my head - then wrote it out to show what is needed - and then we can decide scope etc.

AdrianB38 2011-04-11T08:48:48-07:00

OK - I have gone through a scenario, scribbled things down and from that, concocted the basis of a (tactical) research process. Why? Because I wanted to see what really needed to go into the BG Data Model if we went for fully rigorous methods where the logic arguments are documented. This is the stuff I alluded to in my post of Apr 7, 2011 11:33 pm. (At least - that's the time I'm reading - it may be translated to CET)

And in fact, the Evidence & Conclusion Model only comes in at the very end.

See http://bettergedcom.wikispaces.com/Research+Process%2C+Evidence+%26+GPS i.e. page "Research Process, Evidence & GPS"

So.... >>If<< we confine the scope of Evidence01 to the recording of the conclusions, then Tom's post of the previous Friday, 4:12 am where he says "I believe creating the data model for evidence handling really is that easy!" does make sense. We simply have to make sure that the other stuff is recorded elsewhere.

AdrianB38 2011-04-17T12:41:23-07:00

Data-Event01 - Events with multiple people, with roles

Description: BetterGEDCOM must support the recording of events that affect multiple people. In particular, it must be possible to record the role of each person in the event.

Importance: Mandatory

Why? Events do affect multiple people. Current GEDCOM has almost no ability to record multi-person events, excepting perhaps births and adoptions. However, the parents of a birth in GEDCOM are usually implied by the parents of the appropriate family, creating potential issues when that family is an adoptive one. It would be better to have a birth event involving three people (e.g. child and two biological parents typically), with this data separate from the family.

GeneJ 2011-11-28T17:37:08-08:00

P.S. Do these examples help?

marriage:
... Indiviual / role
... John Smith / p1 or groom
... Sarah Thomas / p2 or bride
... Samantha (Jones) Smith / MotherOfGroom
... Saul Smith / FatherOfGroom
... Sally (Franks) Thomas / MotherOfBride
... Joseph Thomas / FatherOfBride

For a death tag, I have only one principal, but various associates
death:
...
... Joe Peterson / principal (or deceased)
... John Peterson / LossOfFather
... Thomas Peterson / LossOfBrother
... Sally (Smith) Peterson / LossOfSpouse

GeneJ 2011-11-28T18:12:34-08:00

@Andy wrote (with his can-OOoOpener), "... are not genealogical relationships."

Check it out ....

See Curran, Crane and Wray, "Numbering Your Genealogy: Basic Systems, Complex Families and International Kin" (Arlington, Virginia: National Genealogical Society, 2008).

In part, from Madilyn Coen Crane's contribution, "Complex Families," beginning on p. 17, "Traditional numbering systems were designed to present a group of people, all blood kin, who descend from a single immigrant ancestor. When genealogists treat families of the past, their narratives acknowledge multiple marriages and stepchildren; but the numbering schemes, as originally planned, omit step children and adopted children; and they make no provisions for carrying down such lines. Surname changes that result from variations of the nuclear family also remained in limbo [*]. ... Because of the serious genetic issues at stake, as science continues to explore and treat inheritable medical conditions, this paper recommends that adoptions of past eras be treated as frankly as all other aspects of genealogical research."

Crane goes on to explain in some detail how the NGSQ system (aka the "Quarterly" standard) has been "expanded" to report about complex family circumstance. The material covers--adoptions, stepchildren, multiple marriages of direct descendants, etc. Crane writes, "In order to maintain a clear identification of biological ancestry, while including adoptions and stepchildren in the family structure, the phrases adopted by and stepchild of are added to the parenthetical summaries of descent."

:-) --GJ

*Crane references an endnote, "The legal status of adopted children during past centuries is rarely documentable. Not until the 1850s did America begin to see the emergence of adoption laws, generated primarily by society’s need to define legal heirs in the settlement of estates .... ," citing Lawrence M. Friedman, _A History of American Law_, 2d. ed. (New York: Simon and Schuster, 1985); and Carole Shammas, _Inheritance in America_ (New Brunswick, N.J.: Rutgers University Press, 1987).

WesleyJohnston 2011-11-29T02:18:50-08:00

Regarding GeneJ's post ... I am wondering: does "multiple marriages of direct descendants" includes descendants who married each other?

That's something just about every family tree will have to deal with at some point, once you are back to small villages in the 1600's or 1700's.

ttwetmore 2011-11-29T04:02:43-08:00

I read somewhere that before colonial times, on average, marriages were between third cousins, many closer, many further, but this was the average. Thus the issue of direct descendants marrying each other not only occurs, but is the norm not the exception. I have many direct ancestors who were married first cousins, second cousins, third cousins with various levels of removal as well. This leads to what is sometimes called "ancestor collapse." All genealogical programs that I am aware of handle it with no problems. The main problem for software is recognizing, when generating reports, people and families that have already been output, and inserting the appropriate "see over there" tags instead of the redundant information. When iterating to find lists of descendants or lists of ancestors, software has to use "set" structures rather than "list" structures to build those lists, but this is just basic programming. I even have cases where two sisters married two brothers, all direct ancestors of mine, and eventually some of their offspring intermarried, still my direct ancestors. Thus I have people that show up in at EIGHT places in my ahnentafels. I use my own genealogical software, and I can fortunately leave it to the software to keep everything ship shape.

GeneJ 2011-11-29T04:50:52-08:00

Hi Wesley,

(1) In the context of Crane's "Complex Families" the multiple marriages to which I referred have more to do with recognizing all of the family members (all of the "children" from all of the marriages of the heads of family). Joe marries first Susan, and they have children. Joe marries second to Margery, who was previously married with children, and Joe and Margery adopt several children. Crane (NGS/Quarterly) recognizes all the children--whether biological, adopted or step children.

Crane's summary would, I hope, make our Adrian smile, "Modern genealogies are, increasingly, family histories rather than mere recitals of begats within a bloodline. It is important to include all the individuals who shape the nature and personality of a family ... equally important ... that a clear identification of the biological line be maintained ... the guidelines offered in this paper will enable modern Americans to compile genealogies that accurately portray family units in the context of their existence and to present authentic family structures through which the history of our nation can be correctly understood and chronicled."

(2) Intermarriages within the larger family are also addressed in _Numbering Your Genealogy...", but this is not considered a "complex" situation. (As Wesley says, "just about every family tree will have to deal" with this.)

See _Numbering Your Genealogy_, 10-11, for work of Joan Ferris Carran, CG on "_Multiple lines of descent_ from a single forebear, a relatively frequent occurrence ...."

ACProctor 2011-11-29T06:09:19-08:00

Interesting replies. I think some clarification of my post might be needed though:-

Re: "New guardians, foster parents, adoptive parents, etc. are not genealogical relationships; thus no need for a genealogical program to deal with them. They are, however, social relationships and thus a Family History program *must* deal with them"

I'm not actually writing a program - at the moment. My goal though is to define a "source format" for generalised Family History data. Genealogical (aka biological) relationships are far easier to handle but often virtually irrelevant to the lives of the individuals. I believe this is an area where GEDCOM gets hung up badly. It implicitly extrapolates from pure genealogical relationships to infer "family units".

Re: "There are much larger events (wars, earthquakes) that affect masses of people but one wouldn't normally say they have genealogical significance".

Again, Family History is a much more generalised goal. Something like the outbreak of WWII could be a hugely significant time marker (aka Event) that people's lives are related to, irrespective of whether they enlisted or not.

Re: "Using the example of a traditional union, my marriage "pfact" has two principals--call them p1 and p2 (or "bride" and "groom")"

The idea of 'principals' is something I experimented with, although it started to feel like there should be more levels, e.g. the Person(s) being born, their parents, informants/etc. Similarly with a marriage. However, grouping all Persons in an Event (whether by PFACTs or otherwise) doesn't make it easy for the recording of other historical facts, e.g. "X met Y at so-and-so's wedding". Ideally, X and Y should have EventRefs to the associated wedding, even though they may not have had a direct role in it.

I'd like to generalise Marriage to a generic Event-category of 'union' - something that has been discussed elsewhere in these threads. This should include civil & religious marriages, same-sex partnerships, cohabitation, and even multi-party marriages in those cultures that still permit them. This obviously puts more of a strain on the role definitions and things like FatherOfBride may be too specific. I believe the same could be possible for a "change of responsible control" (...can't think of another term off-hand) to include guardianship, fostering, & adoption. This is why I was interested in the slavery form of ownership.

Q: Is BetterGEDCOM focused purely on genealogical relationships, on generalised Family History, or something in between? It felt a little like there was some difference of opinion in the replies to my post so I just wanted to check before going off on a tangent :-)

GeneJ 2011-11-29T08:18:35-08:00

Hi ACProctor,

I don't know what a generalized family history is, but users come in all flavors. BetterGEDCOM is focused on user requirements. I organize materials--sources and "tags"/pfacts--in my software that sometimes reports just the key genealogical facts (BDM), and other times supports a full range of genealogically significant data that would include a host of other life events. Others are more interested in recording record data--which might represent BDM, or it might represent a host of other life events.

(1) I'm saying genealogical relationships are not limited to "biological" relationships. There are certain key genealogical relationships that "link" together a family structure so that a genealogy can be created. These key genealogical relationships identify the family unit--the heads of the "family" (which includes the unions, who are the children (biological, adoptive, foster, step etc.), the unions of those children and their "children" (repeating the noted concepts).

Beyond these key genealogical relationships by which a genealogy is structured, there are other genealogically significant relationships (some of which are the basis of much inferential genealogy).

(2) I realize there are different limitations in your desire to have a "source format," but what you describe as "levels" are to me just different events or facts or different roles that principals or associates play. As below (see 3), believe I'd have less use for generalized roles.

(3) "...generalize marriage." Why? Not all the roles in marriage/union events will be the same.

You wrote, "...puts a strain on ... FatherofBride." I assign the roles in the event. If I don't have a bride, I don't have a "FatherOfBride." Since I make frequent use the role "Bride" in my family file, I have pre-established a role, "FatherOfBride." I also have a role FOBride (which wouldn't translate well); believe it could just as easily be FatherOfP1," or FatherToPrincipal. The associate role will point back to the event.

In software I use, key genealogical relationship tags (the stuff required to structure the family tree, see 1 above) are categorized as birth group, death group, etc.

I don't have examples of marriage events (ie, union, civil union, etc.) involving more than two principals. While I think I know how I'd enter such an event, I'll leave that challenge to be discussed in better context by others.

Hope this helps.--GJ

ttwetmore 2011-11-29T09:12:50-08:00

“Re: "New guardians, foster parents, adoptive parents, etc. are not genealogical relationships; thus no need for a genealogical program to deal with them. They are, however, social relationships and thus a Family History program *must* deal with them"”

There are many relationships between people, genealogical and not. I want my “genealogical” program to deal with all of them. The only distinction to be made, in my opinion, is that biological relationships are the only ones that allow the construction of pedigrees.

“I'm not actually writing a program - at the moment. My goal though is to define a "source format" for generalised Family History data. Genealogical (aka biological) relationships are far easier to handle but often virtually irrelevant to the lives of the individuals. I believe this is an area where GEDCOM gets hung up badly. It implicitly extrapolates from pure genealogical relationships to infer "family units".”

When you have your source format in a written form I’d like to see it.

I don’t know what you mean when you say that it is easier to handle biological relationships.

Actually GEDCOM does not infer the family units. In GEDCOM families are represented by FAM records that the user much create somehow. You might mean that the parent/child relationships are only possible within GEDCOM within the context of a FAM record. If this is what you mean then I agree with you. It is wrong to require that all biological relationships be mediated by FAM records. On the other hand, though, it doesn’t really hurt all that much. If you know that A is the father of B, and that’s all, with GEDCOM you have to create two INDI records and one FAM record with the father pointing to the FAM with a FAMS, the child pointing to the FAM with a FAMC and the FAM pointing to the persons with a HUSB and CHIL. No need for a marriage event or for a mother. You might not think this is the optimal solution, but honestly it ain’t all that bad.

“...Family History is a much more generalised goal. Something like the outbreak of WWII could be a hugely significant time marker (aka Event) that people's lives are related to, irrespective of whether they enlisted or not.”

I agree, but I don’t have a good feeling for how these large scale events would be handled by a genealogical program.

“Re: "Using the example of a traditional union, my marriage "pfact" has two principals--call them p1 and p2 (or "bride" and "groom")"

The idea of 'principals' is something I experimented with, although it started to feel like there should be more levels, e.g. the Person(s) being born, their parents, informants/etc. Similarly with a marriage. However, grouping all Persons in an Event (whether by PFACTs or otherwise) doesn't make it easy for the recording of other historical facts, e.g. "X met Y at so-and-so's wedding". Ideally, X and Y should have EventRefs to the associated wedding, even though they may not have had a direct role in it.”

I don’t like the term “principal” as a role tag, though it is convenient in some cases. I think role tags should come from a relatively large enumerated set of tags, possibly with subtags (e.g., parent.biological, parent.step, parent.adoptive), with the capability of extending the set for unanticipated situations.

“I'd like to generalise Marriage to a generic Event-category of 'union' - something that has been discussed elsewhere in these threads. This should include civil & religious marriages, same-sex partnerships, cohabitation, and even multi-party marriages in those cultures that still permit them. This obviously puts more of a strain on the role definitions and things like FatherOfBride may be too specific. I believe the same could be possible for a "change of responsible control" (...can't think of another term off-hand) to include guardianship, fostering, & adoption. This is why I was interested in the slavery form of ownership.”

I agree with this.

“Q: Is BetterGEDCOM focused purely on genealogical relationships, on generalised Family History, or something in between? It felt a little like there was some difference of opinion in the replies to my post so I just wanted to check before going off on a tangent :-)”

A. I want to be able to create full timelines for the people I research by collating together everything I find out about them. All of this information ultimately comes from evidence, and much of that evidence takes the form of descriptions or reports of events that the persons participated in, and the relationships that were formed with other persons by those events. So all those things (sources, evidence, events, persons, ...) must be modeled very well. Most of us believe that the the vital events are the most important of the events, and I do agree with that, but to get full timelines we must be able to accommodate everything we find out. I think we all agree that sources, evidence, events, persons, are among the key parts of a genealogical model. We disagree, significantly sometimes, on exactly how those ideas should be handled. But there isn’t much discussion here about the large “events” that you are concerned with. Here’s an example from my own work:

My wife’s grandfather was a Polish peasant living in West Prussia (now Poland) and was swept up by Germany’s need to mine coal in the Ruhr valley in the late 19th century. Many peasant men were essentially conscripted and taken west and forced to mine coal. This overall “event” has a name, the “Western Flight,” though more properly rendered in the German as something like “Oesterflugt.” This is one of those “global events” that was critical to this person’s life (he went “awol” and managed to get to the United States, with wife and kids to follow). So I want to be able to mention and describe this global event along with the more prosaic events that occurred in the man’s life. However, a genealogical program pretty much forces this mention to be placed in note structures as there doesn’t seem to be any better way to do it. I don’t really mind this, as my software knows how to take the notes that I write and insert them into any biographical output that I generate. For a real historian, however, I think this “event” should be better model-able.

ACProctor 2011-11-29T09:43:49-08:00

Thanks Tom. Some useful stuff in your reply.

I believe purely genealogical relationships are easier because they're more rigid. We only have one set of biological parents and that fact is independent of the date. All the other types of relationship are time dependent and potentially overlapping.

With Family History, I wanted a way of ensuring I could record first-hand testimony and tales passed down through the family generations. I've allowed for a comprehensive Narrative element (more than mere Notes) into which your example would fit nicely. However, there is also a feature for taking a description of some event out of the Narrative and making it a full Event entity. This would usually be done if the same Event appeared in more than one Person's history since the common reference point effectively pulls their lives together. The choice is optional and made very easy since a Narrative element can contain embedded PersonRefs, PlaceRefs, and EventRefs.

When I write the format up Tom, it will appear at www.parallaxview.co/familyhistorydata but that's just a placeholder at present.

eleanordew 2011-11-30T06:46:54-08:00

ACProctor: "I must admit that I hadn't thought about slavery and Person-ownership Eleanor. If OWNR was removed, was any replacement mechanism or convention put in its place?"

As far as I could tell, no mechanism was put in the place of "OWNR", but I am not very experienced in this format.

GeneJ:"Believe slavery would be a good concept about which we should document a series of case study materials (Wesley calls them "benchmark cases"). These could be outlined on a new wiki page and linked back to testuser's page, "BetterGEDCOM test suite."
see the linked items at the bottom of http://bettergedcom.wikispaces.com/BetterGEDCOM+test+suite

How would one go about collecting this test information? Do you just need some good examples? -- Eleanordew

GeneJ 2011-11-30T07:21:06-08:00

He Eleanordew,

Thank you for replying.

"Do you just need some good examples." --Yes, exactly.

There may be several good examples within Mills' article, "Which Marie Louise is 'Mariotte'?: Sorting Slaves with common names." (http://www.bcgcertification.org/skillbuilders/MariotteNGSQv94-183-204.pdf ).

testuser42 2011-11-30T10:07:14-08:00

ACProctor ... a Narrative element can contain embedded PersonRefs, PlaceRefs, and EventRefs.
That is a very nice idea.

Maybe veering OT:
Tamura Jones has some very good articles about Family History vs Genealogy, and on the concept of "Family" in current GEDCOM and software (e.g.: http://www.tamurajones.net/FamilyInScientificGenealogy.xhtml )

But one article I'd like to point out is this:
http://www.tamurajones.net/AFrameworkForClassicalGenealogy.xhtml

I'd like BG to be able to handle all of the legal, official and biological evidence, as well as all of the stories connecting the people and making them more than just names. I'd also like to have the people connected to the places and times they lived in, and collect stories of some places, but that's really another thread.

ttwetmore 2011-07-13T10:27:34-07:00

Adrian,

Good points as always. Let me give a quick example how I have implemented some ideas you just expressed.

If one were to follow my ideas about events as records, vitals as structures within records, and relationships as references between records, then given a person record and these three options, how would one find the person's father record? Before answering let's go a little beyond my earlier example and consider the following person record fragment:

0 @i1@ IND1
1 NAME Thomas Trask /Wetmore/ IV
1 SEX M
1 BIRT
2 DATE 18 December 1949
2 PLAC New London, New London, Connecticut, United States
2 FATH @I2@

I'm using GEDCOM just so we can understand it easily. This is a person record with a single vital structure for the birth. See what I did? I added a father reference to the birth vital. I never said anything about this facility earlier, because I didn't want to weird anybody out, but there is nothing wrong with this in my view. It's a multi-role vital structure! It is inside principal person's record and it points to other persons the principal is related to.

So how would you find the father of a person in a data model where there can be multi-role events, vital structures and relationship references?

Simple really. If your person points to a multi-role event, check the roles in that event, and if you can infer a child-father relationship between this person and another role-player in the event, there you have it. Obviously the a multi-role birth event is perfect. If the person has a relationship already pointing to his/her father, you're home free. And if you allow vital structures in the form I have just given as an example, it's just as easy to follow a role reference from within a vital as it is to follow a direct relationship reference.

The whole real point here echos Adrian's point. It doesn't matter how the father to child relationship is represented (any of the three described is fine); in the user interface there is no distinction to be made -- a user looking at the screen just sees a person and his/her parents with no clue as to how the underlying data is represented.

It is fair to ask, though, how do these different implementations of father/child get established in the first place? Well, most genealogical applications these days are person-centric. You edit persons, so in this context it is only natural that all events be subsumed into vital structures in person records. However, some genealogical applications are both person-centric and event-centric. In those you can typically enter an event or you can enter a person. When you enter an event you eventually want to link the person role-players to their proper roles. So if you use such a program in an event-centric way the important relationships will end up being expressed through multi-role events. But when you use these programs in the person-centric way you fall back on the vital structures. And of, course, underneath the software could transform between representations and there would be no need for the user to ever know.

As Adrian points out, the user doesn't have to know how the event is being represented.

AdrianB38 2011-07-13T13:03:42-07:00

Tom - I _started_ to get a bit worried with your example - what if there was a multi-person event AND a relationship?

I think there are a couple of answers to that:
1. In your example, if you have a BIRT vital event within the individual, then you shouldn't have a multi-person BIRT event, so the issue doesn't arise.
2. If you have a relationship of FATH outside the BIRT vital event within the individual, then you shouldn't have the father in either the BIRT vital event within the individual or the multi-person BIRT event.

So, assuming that similar logic applies with other potential issues, you shouldn't have an issue. I can't see any NEED for having the same info in 2 places.

OK, OK, "should" - what if you have? Well, there has to be some rule but it's a rule that's in the application because only the developer knows what's the best way of making the app fail gracefully. It's no part of BG to define how to get out of a "Garbage In Garbage Out" scenario.

What's making me disturbed is - if we have 3 ways of doing X, are there justifications for the 3 ways?

I suggest that if we DON'T have personas, then there would be no need for anything other than the one method. If we do have personas, even if they are as limited in their application as those in nFS (if indeed, they are limited) then we need these extra methods in order to describe the information in a source in a codified manner without interpretation.

E.g. a persona from a census would use an AGE tag inside the CENSUS event; it wouldn't create a Birth event because that would need interpretation to create a date for the Birth event.

Similarly, a persona from a marriage (post-1837 UK) wouldn't create a Birth event to record their age or their father's name because that would need interpretation to create a date for the Birth event.

However I still can't get it out of my head that we've got one representation of relationships too many
- roles in multi-person events - sure, we need that.
- Single person events - sure, we need them for personas. Where do we put the relationships for personas though? Inside a single person event, or outside (but still inside the persona's data-record)? I can see how the input might be person centric or event centric as Tom suggests, but to turn his own point back - underneath the software could just use the one representation.

ttwetmore 2011-07-13T14:03:23-07:00

Adrian,

Good-oh.

First, yes, you never need the same info in different places -- no redundancy required.

Second, you ask whether there is a need for the three ways.

I think the 3 ways have subtle differences from one another, so have some legitimacy.

Multi-role event -- I believe this is the right way to encode direct evidence from most physical records that record those events -- birth certificates, marriage certificates, death certificates. These certificates are intended to document specific events, multiple persons are mentioned in them with roles wrt to the event. It is only natural, IMHO, to encode a physical representation of a multi-role event with a computerized, codified multi-role event record.

Vital structure -- I believe this is the right way to encode simple statements that mention the birth, marriage or death of someone, but are NOT statements intended to actually document the event. I hope you can see the difference there. There is some event out there somewhere in the background, but the statement is only indirectly about that event. A little subtle. "Almyra Jane Wetmore was born in Digby County, Nova Scotia." Would you call that the documentation of an event? Yeah, there's an event in there, but it's not mentioned explicitly, only implied. IMHO this is best handled by a simple vital structure in Myra's record. But would anyone really complain if it were handled by a one-role event record? I guess I wouldn't. I would appeal to parsimony arguments, however, to keep things as simple and as succinct as possible. One way you have a single record with a simple birth structure in it. In the other case you have two different records and you need an additional mechanism to link them together. More than twice the "computer data capital" to represent the same information. Inefficient. Unparsimonious. Bad.

Relationships -- I believe this is the best way to handle generic statements of relationships. Here's a good example for you. "Thomas Williams and Mary Doty were first cousins." Let's say you don't know anything about their parents yet, so obviously nothing about their grandparents. All you know is that one each of their parents was the child of at least one, maybe two, other persons. How many hidden events are there in this one? Hard to know. How many implied persons are there in this one? Well, you tell me. Do you really want to create all the anonymous person records and associated events to build up the pedigree you'd need if you had to encode relationships using simple linkeage-linking?I don't it is reasonable to handle this example by creating events, though it might be a good exercise for the reader to decide how they would do that. Most genealogical programs of today would just about choke if you tried to get this info into them in a usable fashion. But don't you think we ought to be able to do so? I think the best solution is something like:

0 @I1@ INDI
1 NAME Thomas /Williams/
1 SEX M
1 RELA @I2@
2 TYPE first cousin

0 @I2@ INDI
1 NAME Mary /Doty/
1 SEX F
1 RELA @I1@
2 TYPE first cousin

Another simpler example. Say you know two persons are siblings, but that's all you know. How do you handle that? In LifeLines I do it this way:

0 @I1@ INDI
1 NAME Thomas /Williams/
1 SEX M
1 FAMC @F1@

0 @I2@ INDI
1 NAME Mary /Williams/
1 SEX F
1 FAMC @F12

0 @F1@ FAM
1 CHIL @I1@
1 CHIL @I2@

Pretty simple, but it requires a family record. Now I don't mind family records, but you might. How would you do this without a family record? Could you do it with event records? Well yes, you could. You'd create two birth events, each with the proper child role, but you'd have to give each birth event refer to the same ANONYMOUS father record and same anonymous mother record. That would work fine, but do you really want anonymous person records in your database. I'd rather not, even though they kind of make sense. I think some programs don't even allow the idea of a person record without a name. I support them in LifeLines by allowing the name "//" (LifeLines might even support the empty string -- I'll experiment later to find out), but I don't like to use them. But if you can only infer father and mother from roles in multi-person events, you have to use this mechanism to encode that Thomas and Mary are siblings. Wouldn't it be better (if there are no family records) to let them point to each other with sibling references? You could also do this by just adding the two anonymous person records, one for the father and one for the mother and them using simple child-paret relationships to link them.

Oh, so much fun. By accepting the three different ways I have proposed I believe that BG can always have the best possible way to codify roles to events and relations between persons, and the ways are simple, obvious, and make common sense.

eleanordew 2011-11-22T13:06:15-08:00

This is my first post, so if it's in the wrong place, would one of the moderators please move it? Thanks.

One of the weird things that happened in the development of GEDCOM 5.5.1, was that the role tag "OWNR" was removed. This is unfortunate because the Owner of a slave is a key resource in finding out more information about that person.

In fact, the data event "slavery" would fit into Data-Event01, i.e., it's an event with multiple people and roles (multiple owners and multiple slaves). It could also fit as an Event longer than 1 day (I can't remember which Data-Event that is) because slavery is a long-term event.

My question is, I suppose, how does one work the idea of slavery and slaves into the BetterGEDCOM format?

ACProctor 2011-11-28T15:29:02-08:00

This is a tough subject but a crucially important. I was struggling with it last week - before I found BetterGEDCOM - and I'm still struggling with it:-

Does an Event group multiple Persons, or are Persons attached to an Event? There are arguments that work in both directions so the answer is probably somewhere in between.

I originally wanted to define an Event as simply something happening at a particular Place at a particular time (& possibly lasting over a period of time). It's possible that some Events may not directly involve any Persons. However, in most cases, a number of Persons will either be directly involved (e.g. people present on census night) or associated with an event because it affected their life (e.g. outbreak of WWII). I was handling these as PersonRefs from the Event in the direct-involvement case, and EventRefs from each Person in the associate case.

However, some Event types indicate vital data about the Persons such as age, occupation, place-of-birth, etc., and I didn't want these in the Event element because they're properties of the relevant Person. But if all cases are done as associates then it places a huge burden on the interpretations of Event-type/class/category and Person-role/status.

A good example would be a 'union' such as a marriage. There would be no single place that had both a Bride and a Groom reference. You could only infer the couple by collecting all Persons having a reference to the same union-type Event and then filtering by role.

As another example, consider a change of family-unit parentage. If a group of children get new guardians, or foster parents, or adopted parents, then how would a genealogical program make that connection when it loads the data. Again, it would have to collect all the relevant Persons associated with a particular Event-type and filter by role.

This would all need very some careful choice of distinct Event-type/class/category and Person-role/status

ACProctor 2011-11-28T15:37:23-08:00

re: "One of the weird things that happened in the development of GEDCOM 5.5.1, was that the role tag "OWNR" was removed"

I must admit that I hadn't thought about slavery and Person-ownership Eleanor. If OWNR was removed, was any replacement mechanism or convention put in its place?

Andy_Hatchett 2011-11-28T15:45:27-08:00

"If a group of children get new guardians, or foster parents, or adopted parents, then how would a genealogical program make that connection when it loads the data."

I'm sure this will open a can of worms but...
New guardians, foster parents, adoptive parents, etc. are not genealogical relationships; thus no need for a genealogical program to deal with them.

They are, however, social relationships and thus a Family History program *must* deal with them.

Not that any of the above helps clarify the matter at all :)

GeneJ 2011-11-28T16:07:26-08:00

@eleanordew wrote, "My question is, I suppose, how does one work the idea of slavery and slaves into the BetterGEDCOM format?"

Believe slavery would be a good concept about which we should document a series of case study materials (Wesley calls them "benchmark cases"). These could be outlined on a new wiki page and linked back to testuser's page, "BetterGEDCOM test suite."
see the linked items at the bottom of http://bettergedcom.wikispaces.com/BetterGEDCOM+test+suite

(I've linked a blank wiki page there, hoping I'll be able to summarize information from some cases I worked on about other BetterGEDCOM topics.)

It would be nice to have a good set of case materials over a range of slavery related issues. --GJ

ttwetmore 2011-11-28T16:25:51-08:00

I will give my take on your points. I have been thinking about these ideas for more than twenty years (for whatever that is worth).

"Does an Event group multiple Persons, or are Persons attached to an Event? There are arguments that work in both directions so the answer is probably somewhere in between."

In my opinion events of genealogical importance almost always occur at a fine time scale at a fine place scale and involve persons that have relationships both with respect to the event, and therefore often with respect to each other. Just think about a birth certificate or a census family group for "classic" examples. I think of the event record and the person records (the name persona has now become very popular to distinguish them from the "conclusion" persons of most genealogical programs) as forming a cluster of records.

"I originally wanted to define an Event as simply something happening at a particular Place at a particular time (& possibly lasting over a period of time)."

There are the events of genealogical significance (genealogical significance means providing information about key points in a PERSON's life [birth, death, marriage, immigration, education, land transaction, military service, ...]). There are much larger events (wars, earthquakes) that affect masses of people but one wouldn't normally say they have genealogical significance. A war as a whole is significant, but what's important at an extended genealogical sense (the family history sense), is when a person enlisted, when they were promoted, the regiments they served in, the ships they sailed on. These are much finer grade events or attributes than wars in general. It would be important to model these macro events for historical purposes, but it might be too much for genealogical and family history. I frankly do not have a good answer to the question of what I think is the best way to place a war or a natural disaster into a genealogical database.

"However, some Event types indicate vital data about the Persons such as age, occupation, place-of-birth, etc., and I didn't want these in the Event element because they're properties of the relevant Person. But if all cases are done as associates then it places a huge burden on the interpretations of Event-type/class/category and Person-role/status."

EXACTLY. Events provide information about persons that is both inherent in the person (e.g., sex of person), BUT ALSO, non-inherent information that is only valid WITH RESPECT TO the event. Age is the prime example, but also things like occupation, residence place, and even name(!) also fit in this category. In the DeadEnds model each event record refers to the person records using event references. These event references not only "point" to the person records, they also carry the role information, BUT ALSO they carry the non-inherent properties of the person. In the DeadEnds model THIS IS WHERE AGE goes. So, with the event holding the date and place of the event, and the role-references holding the ages of the persons at the time of the event, software can easily generate a derived birth event for the persons. And so on. Or tie occupation or residence to a time line.

"A good example would be a 'union' such as a marriage. There would be no single place that had both a Bride and a Groom reference. You could only infer the couple by collecting all Persons having a reference to the same union-type Event and then filtering by role."

A marriage certificate is the evidence of a genealogically significant event. From that evidence we extract an event record (with type marriage) and two person records for the bride and groom (other persons optional for witnesses, parents, officiator). The two role references in the event to the bride and groom can carry, age, residence at time of marriage, occupation at time of marriage, birth place, etc. This info in the event references is available for all the conclusion making processes that follow up the collection of all the evidence. If the software is smart enough obviously.

This leaves open the old question of whether events point to persons or persons point to events or both. The key issue is the 1) recording of the roles so we can infer the person-to-person relationships between the persons, and 2) the recording of the NON-INHERENT information. My preferred solution (which is one of many I agree) is to have the role references from events to persons hold the non-inherent information, but to also have redundant person-to-event references that don't have to carry any other information except a "pointer' (no role or non-inherent attributes). Some argue that this is too redundant. It doesn't bother me a bit. My master database now fits into a GEDCOM file of many megabytes, and I don't fret at all about the fact that the FAMS and FAMC links are all redundant with respect to the HUSB, WIFE, and CHIL links.

"As another example, consider a change of family-unit parentage. If a group of children get new guardians, or foster parents, or adopted parents, then how would a genealogical program make that connection when it loads the data. Again, it would have to collect all the relevant Persons associated with a particular Event-type and filter by role."

You've answered your own question, properly in my opinion.

"This would all need very some careful choice of distinct Event-type/class/category and Person-role/status"

I don't think it's that hard.

ttwetmore 2011-11-28T16:38:41-08:00

Andy said: "New guardians, foster parents, adoptive parents, etc. are not genealogical relationships; thus no need for a genealogical program to deal with them. They are, however, social relationships and thus a Family History program *must* deal with them."

I agree, but I think the majority of persons would expect to be able to handle at least the more "important' non-genealogical relationships, meaning step, half and foster relationships, possibly others. To me the only thing that distinguishes the natural biological father and mother relationships from all the others, is the fact that those are the only relationships we can use to build pedigrees. But of course, that agrees with exactly what you said!

As I discussed a few times, I think there are two types of "relationships" to be modeled in genealogical/family-history programs, and they are very closely related to one another. The first is roles in events, which define the relationships that a person has with respect to an event (e.g., father, mother, child, in a birth certificate). The second is direct person-to-person relationships that are documented in evidence with any reference to the event that established the person-to-person relationships. For example, an obituary will often include the names of the deceased's parents, siblings and descendants with no reference to the various multitude of real events that established those relationships between the deceased and the others. So when we extract the person records from the obituary we simply link the various persons to the deceased by direct person-to-person links that also hold the relationship between the two.

Note that we can also generate the person-to-person links between persons by making inferences about roles in an event. For example, and sorry this is so trivial, but I think it's important, if A has the father role in a birth event, and B has the child role in the birth event, the there is a direct father to child relationships between A and B. I've asked this question many times: should we always favor event-to-person roles or should we favor person-to-person relationship? My answer is that at the evidence level we should use the method that is best suited by the evidence, and at the conclusion level we should choose one and stick to it.

ttwetmore 2011-11-28T16:56:08-08:00

Slavery fits into the event and persona model analogous to military service.

First, overall slavery as a large scale historical "event" is not an event of genealogical significance so we don't model slavery as a whole as an event. If we are compelled to describe the evils of slavery, we can write that up in a note record and have the persons who owned or were slaves point to that record.

However, there are both event-to-person roles and person-to-person relationships that exist in the slavery situation. There are events of buying and selling and events of manumission that have roles. These define the condition of personal slavery and are all software needs to know if and when someone was a slave and someone was a slave owner. And then there is evidence that simply states the existence of a slavery relationship between persons that establish person-to-person relationships, e.g., a wikipedia article that states "Sally Hemings was Thomas Jefferson's slave" -- create two personas and connect them with slave/owner person-to-person relationships.

The only thing giving me pause is the idea of being the child of a slave and therefore being born into slavery, as the state of being a slave begins at birth. I'm sure we could see our way to a simple solution.

Treating slavery like other events and relationships allows our software to infer if and when or where a person was a slave, just as our software can determine whether a person did military service, when and where.

GeneJ 2011-11-28T17:08:02-08:00

@ACProctor,

You wrote, "A good example would be a 'union' such as a marriage. There would be no single place that had both a Bride and a Groom reference. You could only infer the couple by collecting all Persons having a reference to the same union-type Event and then filtering by role."

Non-tech here, but you lost me a little there. In the software I use today, I associate multiple people with a given pfact--say a marriage, a death, etc.

Using the example of a traditional union, my marriage "pfact" has two principals--call them p1 and p2 (or "bride" and "groom"). I can "associate" other persons with that pfact--parents of the bride/groom, members of the bridal party ... the photographer. If I wanted to, I could associate every person who was sent an invitation (call them "invitees"). In the example I used, there is still just one pfact/event/tag.

Hope this helps.--GJ

AdrianB38 2011-07-08T13:44:12-07:00

Yes - it is lengthy, isn't it. Sorry, but somehow I'm not surprised it's a TMG thread! Though maybe I should talk...

Anyway - adding an associated entry for "loss of father" - strictly, your software ought to enable that to be visible and I _ought_ to start muttering about putting in duplicated and unnecessary data. However, where the software's reports don't highlight it, then it seems as good a way as any to highlight these issues.

(This is why I am cynical about claims that software can produce excellent narrative reports - adding in highlights like those are useful but I've never seen it well done. If you were manually writing the report, you might say - "In YYYY, the children lost their mother" (i.e. picking up on that associated entry but grouping it), or you might say "In YYYY, XXX lost her mother" if it's just one child you're reporting on, or if you've written about her death in the previous paragraph, you might not mention it at all. You simply can't concoct a rule that's applicable for all cases.)

GeneJ 2011-07-08T14:34:15-07:00

@Adrian ...

You wrote, "In YYYY, the children lost their mother" (i.e. picking up on that associated entry but grouping it), or you might say "In YYYY, XXX lost her mother"

It's a powerful feature and helps so much during the research process.

Take the example of a family that is migrating. Well, where a young child died may be one of the few notations you have to place the family on a certain date along that migration route.

In more than one occasion, an associate tag helped me locate an obituary published where one sibling lived -- and it called out the residence of all the other siblings.

In the software I'm most familiar with, users can exclude all witnessed events from narratives in the software I use, or include them. I also have a special detailed family group sheet format set up--and the witnessed events are great on the FGS.

I've seen some great narratives, but that doesn't mean that all users have or even desire the skill it takes. --GJ

Christine_E 2011-07-11T02:10:47-07:00

Other examples of one person having multiple roles:
A person was born at home and delivered by the father who was a doctor and signed the birth certificate. Father and Delivery doctor.

A Graduate could also be the Vocalist or Valedictorian or Presenter of a gift to the school on behalf of the class.

A Graduate's parent(s) could also be a Teacher and/or Principal there. (I knew a person who was the Principal/Teacher/Parent for her child's graduation.)

Christine_E 2011-07-11T02:22:20-07:00

Retirement (parties) can be a multi-person event especially if there is a retirement incentive (golden handshake) and several people take it. In teaching, many teachers can retire on the last day of school. In lay-offs, there are mutliple people quite often.

And because some people marry a co-worker or help their child get a job at the same company, the event can apply to more than one person.

I was at a funeral yesterday where the founder of a family business died and other relatives worked there and also spoke at the funeral. So even though only one person died, the company and funeral had multiple roles played by family members. Within the company, one person can be promoted to different roles while working there.

AdrianB38 2011-07-11T07:02:43-07:00

So with a little bit of thought, it looks like just about any event can be seen, in the right circumstances, as a multi-person event, potentially with multiple roles per person.

Two "howevers" spring to mind:

1. However, just because an event _could_ under other circumstances be a multi-person event, doesn't mean it always should be recorded in that form. My gut feeling says that in a BG file, there are sound reasons to _allow_ software writers to create single person events "inside" an individual's details, just like GEDCOM does for all things today.

I'm not saying anything about the database used internally.

2. However, I'll bet that for every combination of people, event and roles, someone will say - "That's not a multi-person event - that's one of these, one of those and one of something else again." That's OK by me. You do it your way, I'll do it mine. It's allowed to be like that!

ttwetmore 2011-07-11T07:17:24-07:00

Adrian,

Events provide three critical types of genealogical information. The model here is that we have evidence for an event and we are codifying that event into "evidence records."

First there is the event record itself -- date and place and other non-person particulars.

Then there are the role-players in the event -- the persons mentioned, attributes mentioned, and their roles with respect to the event. It is important to separate the attributes into intrinsic attributes, that is long term attributes of the person, e.g., name, sex, from the attributes only relevant at the time and place of the event, eg., age, place of residence).

Then there are the goodies one can glean about the relationships between people. This is as obvious of knowing that the child-role is a child of the person in the father-role. But there can be much more subtle clues as well, as relationships between people can be mentioned in evidence, completely outside the realm of the event itself. One good example is a witness on marriage certificate. The marriage certificate will define one event record and a person-role record for all the person mentioned on the record, with their roles with respect to the event. Witness is one of those roles. But what if the witness is also described as the sister of the bride? This establishes a relationship between two people that IS NOT based on the event roles.

If we are to be general, we must have mutli-role events, and if we wish to codify the events into evidence records in our databases, we must codify the into event and person records. We must link the records via their roles with respect to the event, and we must be able to codify the "extra-event" relationships that are mentioned by the event evidence.

TW

AdrianB38 2011-07-11T08:55:38-07:00

Agreed. In principle. I think.

To give a concrete example, an English marriage certificate gives the occupation of each party, their (alleged) residence, etc, etc., thus:

1856 Marriage solemnized at the Parish Church in the Parish of Nantwich in the County of Chester.

No. 218
When married: Sixteenth day of September 1856

Name: John Doe
Age: Full age
Condition: Bachelor
Profession: Cordwainer
Residence: Beam St
Father: James Doe
Profession of father: Cordwainer

Name: Mary Roe
Age: 20
Condition: Spinster
Profession: -
Residence: Beam St
Father: Michael Roe
Profession of father: Cordwainer

Married in the Parish Church according to the Rites and Ceremonies of the Established Church after Banns by me, [A. F. Chater] Rector

This marriage was solemnized between us
John Doe his X mark
Mary Roe her X mark
in the presence of us
<sig> Charles Coe
Esther Coe her X mark

From this one can tease out:
- one event with up to 7 people in it playing various roles (I probably wouldn't include the minister, nor the witnesses unless I felt they were relatives - though this is a bit chicken and egg. Um);
- up to 7 persons each with several attributes including name; age; marital condition; trade; residence (alleged); education level;
- plus relationships tbw the parties and their fathers, which, as you say, are outside the event.

And the event would need sub-types (banns), location (St. Mary's, Nantwich), etc. And perhaps some extra notes such as "Charles' signature is dreadful" or "Charles is a witness on most marriages on this page."

I think I am yet to be convinced how much one codifies this extracted information ("extracted evidence" if you prefer, though since we don't actually have a problem to solve - yet - that's not strictly true).

One could go to the extreme of writing it all as free-text, one statement per line. The disadvantage here is the inability to search free-text in a robust manner. (Yes, one can. But if everything's free text why don't we just use a word-processor?)

Then there is the opposite end of the scale where one codifies all the information to the same level of detail that one intends to end up with. Note it will NOT be coded in the same manner as one ends up with. AGE, for instance, will be codified as an attribute - we don't create a birth event in order to record the age. I suspect one might very well encode 2 birth events out of this - one for each of the bride and groom, with their fathers being linked into those 2 events. But we do this to record the relationships, not imply the ages.

I haven't necessarily got my head round exactly what it looks like, and which things - like AGE - differ from the ultimate target.

Those are the 2 extremes of codifying evidence / information. In between, there's some good old British compromise of using text for many of the bits of data but codifying the major items, those that you'd create search algorithms on. I'm not sure if this is a compromise or falling between two stools.

To summarise - I agree with you Tom, subject to my wondering if absolutely everything needs to be codified or whether one might get away with only codifying the search data.

And that's one issue for me - I can't envisage the detailed logic that will use this data, so I'm cautious.

ttwetmore 2011-07-11T09:06:09-07:00

Adrian,

I love your pragmatism. I agree with your points. I would "codify" what was genealogically significant. I would likely leave off the minister. By the way I do this kind of codification all the time in my own records, and it seems always to be a compromise between pedantry and synopsis. I generally leave off ministers, doctors (birth and death events), registrars (court events, land events, census events, ...), but I would normally keep marriage witnesses, since they are usually of genealogical significance to the primaries.

Following another of your points, you could imagine marking up the original text as your codification. This also has a long and noble history. In some sense the whole concept of marking up was invented for this very purpose, to give semantic meaning to text without altering the text itself.

More later. I'd like to once more present my views on three contrasting ideas -- events, vitals, and relationships.

ttwetmore 2011-07-12T07:50:51-07:00

Adrian,

Here is what I wanted to mention once more about the importance of three different model components needed to fully record genealogical evidence and conclusions. This thread might not be the best spot to put this, but since one of the three components if the multi-role event, it doesn't seem to far astray.

First is the concept of an event record. This is the multi-person, multi-role record that has been discussed here before. The event record is a codification of evidence for some event found in a source document. The record records the place and time of the event, the type of the event, and any other information pertinent to the event as a whole. Each person mentioned in the event, or at least the persons the researcher is interested in, are codified into person records. The event and person records refer to each other through event-person role references. Though these are event-person roles, they often imply important relationships between the persons. For example, the person playing the child role in a birth event, and the person playing the father role in the same birth event, have a parent-child relationship between them. Many marriage events mention the bride and groom and their parents. There are many implied relationships in those six event-person roles. To support the information about events found in evidence, a genealogical model must provide records for the events and the persons, and those records must be able to refer to each other through the role concept. Software must be capable of inferring the implied relationships between the persons.

The second concept is that of a vital attribute. We often learn about these attributes from a statement of fact, not from any evidence of an event. For example, a source might state that a person was born on at particular day and place with no mention of parents. One could theoretically infer a birth event from this statement, and create a one-role birth event record and a person record and link them with a person-event child role. A great deal of genealogical data is like this, however, so most genealogical data models are designed to handle this information as vital attributes rather than as events. For example GEDCOM uses the 1 BIRT and 1 DEAT attributes to hold this birth and death attributes. Better GEDCOM should support this idea of a vital person attributes.

The third concept is that of a relationship between persons. We often learn of a relationship through a statement of fact, not from the evidence of an event. For example, a source might state that one person was the father of another. One could theoretically infer a two-role birth event from this statement, with one person in the child role and the other in the father role. Or one could more simply create two records for the two persons and link them with relationship references. Better GEDCOM should support this idea of relationship references.

These three concepts occur at the evidence and the conclusion level, though I concentrated on the evidence level above. We can have evidence about events and we can have conclusions about events. We can have evidence about vital attributes and we can have conclusions about them. Yada yada relationships yada yada.

Christine_E 2011-07-12T21:24:18-07:00

Adrian proposed a definition of Event as:

my current favourite is the concept that "an event involves a change of state (i.e. of status)" or, (if you're not into scientific terminology), just say "a change of something".

I looked up the definition of Event on http://dictionary.reference.com and two of the definitions fit our genealogy purposes:

1. something that happens or is regarded as happening; an occurrence, especially one of some importance.

3. something that occurs in a certain place during a particular interval of time.

You were right in that I was suggesting that single-person events be handled the same as multiple person events, especially when the event is the same in both cases. From the user's perspective, why should they enter data differently for one compared to another?

However, for some events, I might not use the Event attribute. For graduations or retirements, for example, I might tag the individuals or put an entry in their notes. If I felt like several people should be grouped because they graduated at the same time, I would create a Group for them. (Since I've never used Groups or Events, this is just my current thinking).

I could see GeneJ's point about it being useful to record where someone's family member died in the person notes, but I would only do it if it likely strongly impacted them, such as if they were still a member of the same household at the time of death. (This would imply a change in the living unit.)

And just for interest's sake, I have a chain migration where everyone who left a certain village in Europe ended up in the same town in the U.S. After a while I could easily scan through the US town marriage records and pick out the immigrants from my village because they were all married by the same minister who spoke their native language. I could group these marriages (and the christening of their children) as a group but I would want to record the minister's name for them, whereas I wouldn't for other marriages.

ttwetmore 2011-07-13T03:33:41-07:00

Christine,

I hope you read my post and then thought about the difference between the event and the vital structure, as these are the two concepts you are now discussing. Any "vital event", for example your graduations and retirement examples, could be represented by a separate one-role event record with associated person record, or simply by a vital structure within the person record. This is the fundamental issue involved here, and I think it is well understood. There are some Better GEDCOMers who have a strong feeling that there should only be one way of doing things, and that there should be a decision here. If that one way decision were made it would have to be in favor of the multi-role event. My argument is, of course, is that you can view these as different things. You should feel obligated to create multi-role event records when you have explicit evidence about an event. And you should feel obligated to create vital structures when you don't have that evidence. And there are gray areas where you can go either way.

AdrianB38 2011-07-13T08:31:00-07:00

Christine - two important points to pick up on:

"From the user's perspective, why should they enter data differently for one compared to another?" They shouldn't. There is absolutely no reason why the user should see any difference on screen between a "thing" that is represented as a multi-person event behind the scenes and another "thing" that is represented as a single-person event behind the scenes, and comes out on the GEDCOM or BG file _within_ a person. Well, no difference except for there's only one participant in one on the screen.

Secondly "for some events, I might not use the Event attribute ... I might tag the individuals or put an entry in their notes ... I [might] create a Group for them"

Absolutely. BG has to make all the options a/v and leave it up to the user which to choose. If they were all identical in resultant functionality, the "one-way" people should rule. (All ... One ... Rule ... All.... Excuse me while I fight off the temptation to misquote the inscription on Sauron's Ring)

However, there are slightly different meanings in each of your quoted ways and I'd hope BG would be able to accommodate them all.

AdrianB38 2011-04-17T12:48:56-07:00

This is just to open up a place to record long-standing(?) conclusions about multi-person events.

These conclusions may be scattered through the Wiki but some things spring to mind:
- the multi-person event is an entity in its own right, of equivalent status to (say) persons. In GEDCOM terms, it's a Level0 thing, just like a person is. Or in RDBMS terms, it's a row in its own right in the table tblEvents.
- Example: a marriage event would be an entity in its own right, pointing to (say) bride, groom and two witnesses.
- Each of the people / event combinations would have a value to describe the person's role in the event.

Christine_E 2011-07-07T21:24:56-07:00

Let's discuss this two ways, first as an event that involves multiple people, then as an event involving one person.

If we think this should only be for multiple people then the Description: should be expanded to something like

BetterGEDCOM must support the recording of events that affect multiple people. In particular, it must be possible to record the role of each person in the event. A situation involving only one person (ie, a single death) is not considered an event for BetterGEDCOM purposes. Example of events are Births, Adoptions, Marriage, Lawsuits, Natural disasters.

Now what about Immigration, Naturalization, Accidents, Graduation, an honor? They could involve one or more of our ancestors. (When they involve only one person, there is probably someone else there, but he/she/they are probably irrelevant to the event we are documenting.) For example, someone immigrated. Most likely he/she came with others even if the immigrant didn't know them. If they came on a ship or plane or train, there were also crew/flight personnel involved.

Shouldn't we document immigration the same even though sometimes it involved only one ancestor and other times it involved several of our ancestors together?

Christine_E 2011-07-07T21:33:49-07:00

I propose that this discussion start by listing things that are events and aren't events in the genealogy sense to give clarity to this requirement.

retirement?
illness?
move to new residence?
religious ceremony?

I ask other members to list more. . .

AdrianB38 2011-07-08T05:16:16-07:00

Christine - I think there are a couple of facets that link into this discussion.

Firstly - what IS an event?
And - should single person events be recorded in BG differently from multi-person events?

OK - what IS an event? There are probably many times that question has been asked in this Wiki and having tried all sorts of definitions involving the presence or not of values, my current favourite is the concept that "an event involves a change of state (i.e. of status)" or, (if you're not into scientific terminology), just say "a change of something".

That being so, I think referring to a "situation involving only one person (i.e., a single death)" as not being "an event for BetterGEDCOM purposes" takes us into territory where we're on a loser. It's not a multi-person event, certainly, but it is an event for a single person, so we might as well call it an event.

What is more interesting is what I think you're driving at, which is, should single person events be physically recorded differently from multi-person events and are there any such events that are always single person?

I think there must be event types that are single person only - injury and illness are two that spring to mind, along with retirement, graduation, promotion, etc... (I just took a quick look at the GEDCOM 5.5 list).

However, I suspect one could argue about several of those - what if a whole family were struck down by an epidemic? Or were all in a traffic accident? And if it were a family firm, it might be a father promoting their daughter? Plus it's always newsworthy when a parent and child graduate together. Move to a new residence could be a move of a whole family. And a death might involve a relative registering the death later - sure, you could add that as a separate event but I'm not a fan of extra events just for the sake of it.

About the only one I can't think of a multi-person event for is retirement. So, unless someone comes up with some more, I think we must allow that any single-person event could also, under some circumstances, be a multi-person event.

Does that mean we need to code all single-person events in BG as if they were multi-person (i.e. as if they were all top level entities?). I don't think so. For one thing, having all events as multi-person dramatically increases the size of a BG file and reduces the readability of the output text - which people will still want to read. Not sure if it increases the coding workload or not. I think coding everything as multi-person would probably reduce the workload.

HOWEVER - if we go down the nFS route of having personas (i.e. stripped down individual records) for sources (a.k.a. the evidence and conclusion data model) then there are sound arguments for keeping the persona bundled inside one person-type record and therefore putting all that persona's events inside the record as single person events.

Conversely, if you don't want to use personas for recording the evidence but are happy to have the evidence as text linked to a source (say) then having all events as top-level, multi-person events is simpler in coding (I think), even if rather bigger in file-size.

GeneJ 2011-07-08T09:12:40-07:00

Humm...

The use of associates is among the few reasons I use particular software. I could almost get downright emotional about it!

Perhaps I'm confusing the requirement, but in my current practice/current software, I add associates and roles to many events. Assigning a role role doesn't mean they were present at the event, but it certainly could include those individuals.

Death -- of father; of mother -- when a parent dies, I add an associated entry for "loss of father" or "loss of mother" to the record of each surviving child.

If the parent survives and a child dies, I add an associated event for the loss of a son or loss of a daughter.

OOo. I have roles for loss of brother and sister, too.

A child marrieds ... surviving parents are associated ... A son enlists in the army .. I associate that event to surviving parents ...

GeneJ 2011-07-08T09:24:07-07:00

Adrian wrote, "It would be better to have a birth event involving three people"

In the the associates/roles enable software I use, events are linked to persons by (a) principal roles and (b) associate roles.

Here is a _lengthy_ user discussion about whether there should be a limitation in the number of principal roles (vs associate roles) per event:

http://archiver.rootsweb.ancestry.com/th/read/tmg/2011-03/1299561812

GeneJ 2011-07-08T09:33:56-07:00

Bringing this up only for discussion.

Should BetterGEDCOM enable/allow an individual to be assigned more than one role in an event?

In the software I use, an individual is only allowed to play one role per event.

Probate is a common example of persons who play multiple roles in an event. It's not so unusual for one or more children to be selected to administer an estate (or designated as executors) and for those same children listed with others as heirs to the estate.

Ala, you have one or more children who have multiple roles in the same event.

My work around is either to create two tags (events, say "probate administration" and "probate") or to create separate roles (say "administrator and heir" and "heir").

I know those new to roles find this a little inconvenient, but the rule "one event=one role/person" does probably save us from many errors (such as marrying oneself, being your own mother or father, or your own pallbearer).

AdrianB38 2011-07-08T13:33:00-07:00

"Should BetterGEDCOM enable/allow an individual to be assigned more than one role in an event?"

I think "yes" - as you say, the probate / will event is one obvious answer.

Executor and Trustee and Beneficiary is one possible combination - Executor and Trustee are 2 different roles. Sure, there are ways around things, you could concoct a probate event and an inheritance event to separate Beneficiary out, but I would find it tricky to split Executor and Trustee. Yes, you could create a new role of "Executor and Trustee", but c'mon, this is getting silly.

Again, in births, it might prove useful for someone to be declared as both egg-mother and birth-mother (in the sense of one who carries the embryo to term). While that is the normal biological combination, in the case of test tube fertilisation, an explicit statement of such might be useful.

While the idea of stopping erroneous entries is attractive, I think it would be the case that the inconvenience from stopping legitimate combinations outweighs the benefits from stopping errors.

WesleyJohnston 2011-11-15T07:04:24-08:00

Data-Place03 - place can be member of several place hierarchies

The discussion that led to this focused mainly on place name changes. I want to make sure that another aspect of a place being in several place hierarchies is not lost.

Where I sit right now, I am simultaneously within the jurisdiction of multiple record-creating/keeping authorities. Certainly there are my address, city, county, state and country. But there is also a water and sewer district, a schoold district, an electric utility district, a gas utility district, a water management district. At various times, I receive mailings and pay bills to some of these districts.

A similar situation exists within churches. There are many different terms used in different denominations, but they all divide the world up into their own districts, which are important to know when you are trying to find records: parish, conference, synod, etc.

In fact, any wide-spread organization is going to have the same sort of hierarchy. And if the person I am researching was a member of the Veterans of Foreign Wars of a freemansons lodge, it behooves me to know how that fit into their structure.

So while this particular requirement originated mainly from consideration of changes of place names and boundaries over time, it can also encompass a great deal more.

ttwetmore 2012-05-31T23:01:52-07:00

Here is my proposed model for a place. It is an E-R style model though I am using the terminology of an element for an entity, so that a sub-element is the has-a relationship between entities, and references are used to represent entity to entity relationships. This translates directly to RDF triples as well. Using the words element and sub-element should make an XML, GEDCOM, JSON, etc, representation of this place model fairly obvious. This is the DeadEnds place model. I believe it meets all requirements that have been mentioned with respect to places.

A place is an element. It may be a sub-element of a higher level element (e.g., an event element), or it may be a top level element of its own. If it is a top level element it must have a unique ID to allow it to be referred to by other elements.

A place element contains sub-elements. The most important are:

name (required) – an element whose value is a comma-separated list of name parts, which can be a single name part; and

type (optional) – a comma-separated list of name part types in one-to-one correspondence with the name parts. Name part types come from a fixed vocabulary.

Other optional sub-elements include media links, latitude and longitude, the date ranges when the name was known to be in use, historical notes, source references, language of the names, and so on.

A place element may refer to a higher level place element that contains it by using a place reference sub-element. Higher level places are always top level elements. Place references include the unique ID of the top level place element. Place elements may form hierarchies by chaining places using place references. All but the first place element in a chain must be top level place elements.

Important: the term top level does not mean an element is at the top of a hierarchy; it means that the element is not a sub-element of any other element.

A place element may contain multiple place reference sub-elements, allowing places to be contained in multiple higher level places, and therefore to be members of multiple hierarchies.

Tom

ACProctor 2012-06-01T03:58:28-07:00

Re: "I also like the idea of having some form of link to other known names of the same place..."

There are many similarities between place names and personal names Alex.

Both entities may have alternative names over different periods of time
There may be spelling variations, especially over time
Both entities may have different names in different languages
The names of both entities may involve abbreviations (e.g. Thos. for Thomas, or Co. for County)
There may be entities with identical or similar names in the same locality (or in the same family for the case of personal names)
The named entity may come into being at a given date, and cease to exist at a different date

They both have a parentage too. However, the parentage of a person is fixed (i.e. their biological lineage) whereas a place may have a variable parentage (i.e. its place hierarchy).

STEMMA tries to capitalise on this so that the tokenisation of names, and the rules for matching a name against multiple alternatives, can be the same for both entity types.

The one place this falls down is in the classification of the parts of a personal name (i.e. surname, given name, middle names, prefixes, suffixes, name particles, etc). Without this classification, the sorting of names, and possibly the presentation in a formal or informal style, cannot be done for a personal name. I'm still thinking about this.

Tony

AdrianB38 2012-06-01T07:03:18-07:00

Tom,
How would you cope with
(a) multiple names over time? E.g. New Amsterdam becoming known as New York
(b) places transferring from one higher place to another?

You may be intending the multiple hierarchy concept to cover (b) but I need to ask as it's not quite the same concept.

Otherwise this looks simple and flexible.

Adrian

ttwetmore 2012-06-01T07:56:37-07:00

Tony and Adrian,

I have to ask a fundamental question. Must the BG model provide full support for the UToP ("unified theory of places"), or should it be a simple and practical system that skirts around the full complexity of the UToP?

How important is it to link, within a genealogical database, places together because they hold different names for the same real place? How important is it that our place model keeps track of place names changing over time? Where is the 80/20 breakdown in the complexity of our place model between usefulness and completeness.

I feel a constant tension between making a model too simple and making it too complex. I always occupy the simple end of the spectrum, knowing that I will always be balanced by others on the complex end.

To answer Adrian's 2 questions (and I have examples of both things in my database).

1. I generally try to use the name and geopolitical structure that existed at the time of the event. Though I am not fanatic nor consistent about it, especially when using software that doesn't appreciate the subtleties involved. So I use New Amsterdam during the right time frame, and New York during the right time frame. I have some Dutch ancestors for the New Amsterdam time frame.

2. This is kind of subset of 1. One example in my database is an ancestor who died in Brooklyn in 1881. His death certificate is from the City of Brooklyn, Kings County, New York, so I recorded his death as occurring in the place, "Brooklyn, Kings, New York, United States." Currently Brooklyn is a borough incorporated into the City of New York, so today I would refer to Brooklyn as "Brooklyn, New York, New York, United States." What is a little ironic about this situation is that Kings County still exists and it shares the same boundaries as the borough of Brooklyn. In fact the the single city of New York contains five boroughs and each borough has the same boundaries of five of New York State's counties. Yes indeedy. So if you waned to get the county name into this place you'd have to go with "Brooklyn, Kings, New York, New York, United States" with a type element of "borough, county, city, state, country". But it works.

I imagine that my approach is too simple for most of you.

Tom

ACProctor 2012-06-01T08:03:14-07:00

Re: "How important is it to link, within a genealogical database, places together because they hold different names for the same real place? How important is it that our place model keeps track of place names changing over time"

We have to do this for Persons so that we can find them given vague or informal references. I would say the same applies to Places.

As I said earlier, I think the approach to both types of named entity can be generalised to a large degree.

I thought you be keen on that Tom given your views on "generalisation" elsewhere. :-)

Tony

ttwetmore 2012-06-01T09:17:32-07:00

Tony,

As I admitted I can be inconsistent.

Handling "same-as" is easy. Add a reference sub-element:

<place id="1234">
  <name> New Amsterdam, Holland <name>
  <type> colony, country </type>
  <date> between XXXX and YYYY </date>
  <sameas id="1235"/>
</place>
 
<place id="1235">
  <name> New York, England </name>
  <type> colony, country </type>
  <date> between YYYY and ZZZZ </date>
  <sameas id="1234"/>
</place>

In this example you would likely really have four place elements, splitting out historical Holland and historical England to their own place elements. This is the tip of an iceberg. The question is how much of the iceberg do we want in the model?

Do we have a requirement that we must support these "same-as" relationships?

The argument that because we do something for people we should also do it for places is one I can't automatically agree with. Genealogy is primarily about people, and places enter in only in so far as they support persons. We are much more interested in all the alternative names that were applied to a person than we are to the names given to the places where the person lived. If we were discussing the requirements for an on-line gazetteer, then we would be discussing an application that is primarily about places and their names, so the place model would necessarily be much more complex.

The argument that we need sameas for places to enable searching in the same way that we need sameas for searching for name matches, though having some merit, would only apply in a very small minority of cases. Does any modern software support the idea? If I search for ancestors in Nova Scotia in 1783, will the software know that in 1783 New Brunswick was still part of Nova Scotia, so the area of search should be increased to cover modern New Brunswick? Is it up to the software to know these things or is it up to the researcher to know these things. I'm not suggesting the answer by the way, but it is an interesting question.

Tom

ACProctor 2012-06-01T10:39:33-07:00

I see the approach you're taking Tom. However, I have a small issue with "Genealogy is primarily about people...".

Family History is usually considered to encompass more than genealogy. I know that my thoughts are rarely mainstream ones but my own data includes some historical narrative on a few places because they were so important to the family, and this includes specific houses as well as villages or neighbourhoods. Pictures, though, would be something that most us can relate to.

Re: "Does any modern software support the idea?".

I'm a little unusual, again, because I don't use any software products other than my own. If a Place Authority existed then I'm sure the online content providers would go for it because of the advantages in providing agreed information about alternative names and hierarchies. A suggestion I wrote up somewhere was that the Place definitions held by a Place Authority can be cross-indexed with the relevant census returns, e.g. so that each street can be linked to its relevant census pages. Our National Archives came so close to having this information set up, but then abandoned it. I don't believe they ever saw the potential in this field though.

Tony

ttwetmore 2012-06-01T11:03:19-07:00

Tony,

I don't have any big push back against your points here. The differences I see between genealogy and family history are found in the types of relationships and events that each support. I think of pure genealogy as exclusively concerned with parent/child relationships, primarily the biological ones, along with the vital events of birth, death and marriage. I think of family history as broadening the bounds of pure genealogy to include interpersonal relationships of all kinds, and personal events of all kinds. I would also say that genealogy as it is generally practiced today is more than pure genealogy and may include many features of family history. I don't see how the differences between these two has a material impact on the nature of the place sub-model required to support them both. However, if all you need is the same-as relationship to get your requirements met, then I'm all for it.

For over twenty years I had exclusively used my own software, a program named LifeLines, to hold my genealogical database and to generate reports. The only thing I do differently today is that I have a family tree up on Ancestry.com. There is no easy way to keep the two systems up to date, and that is worrisome, though I don't really worry about it all that much.

Tom

AdrianB38 2012-06-01T15:32:39-07:00

Tom - how far should we go? A good question - to some extent a difficult question to answer because if I try to answer it on behalf of the FH community at large, I have to fall back on the fully complex model because only then do I have a feeling that anyone and everyone can be supported. But for me personally, taking what I'd like to do with my own relatives, I think these ideas leap out at me:

1a. Re New York / New Amsterdam - a typical question would be, list off all people named John Doe, living in the 1600s in place Y. Suppose one was recorded as born in New Amsterdam, and another in New York. (So yes, contemporary names apply in the presentation). At the moment I would need to run off 2 enquiries - one to pull off all people named John Doe, living in the 1600s in place New Amsterdam, the 2nd all people named John Doe, living in the 1600s in place New York. It would be nice to do it all in one go, i.e. a query that recognises NA and NY as the same place so returns both.

1b. My actual examples are subtly (or even not-so-subtly) different. My home town of Crewe started as a settlement within and across the 2 townships of Monks Coppenhall and Church Coppenhall. If I want to pull off a list of all people named Mary Roe born Crewe 1840 +/- 10y, I currently need to do 3 queries - 1 for each of those places. It would be nice to be able to record that both Monks Coppenhall and Church Coppenhall are subsumed into Crewe and ask the question once - presumably by asking about Crewe.

An interesting thought - any attempt to concoct a dated quasi-legal relationship between the 2 Coppenhalls and Crewe is probably counter-productive as someone might be answering the question "Where were you born?" in the 1881 question, not with the legal name as it was in 1840 when they were born, but the settlement name of 1881. So a simple synonym relationship is probably all that's needed.

2a. Your Brooklyn: The equivalent example I always use is "Widnes, Lancashire, England" (for pre-1974 events) and "Widnes, Cheshire, England" (for post-1974 events). Certainly, I'd like to be able to record the contemporary name of the place but then I'd end up with (in current software) 2 places. Ideally I want to be able to search on just one of those names but get events recorded with both the pre and post-1974 names.

So I think one important point is that I'd like to be able to record contemporary names and so have them printed appropriately in reports - but I'd like to be able to run queries that pull off places in all their forms by just one query. It may be that synonyms are all that are needed and the rest is just gold-plating...

2b. If I try to be too rigorous with place relationships, I think I'm liable to come up with names that are not recognisable by normal people. For instance, I've recorded the Antipodean city of Melbourne as "Melbourne, Victoria, Australia". But prior to 1901, Australia is a geographic expression only and Victoria is a Crown Colony in its own right, so ought to be rendered simply as "Melbourne, Victoria". And I'll bet lots of people will get confused over where that might be. Hm. So my hierarchies are sometimes a muddle as well, it seems!

ttwetmore 2012-06-02T08:43:06-07:00

Adrian,

On the how far do we go question, you definitely go further than I would.

I do want to ask something. BG is a format for archiving genealogical data. Should that data contain the historical gazetteer of the world, or should that historical knowledge be applied by custom software. For example, should you be responsible for creating a "network of historical places" to deal with the Crewe, Monks Coppenhall and Church Coppenhall" issues in your actual data, or should you expect software with an adequate place authority to be able to know the physical and temporal relationships between those places? I'd much prefer the place authority solution, but one must wonder when or if such agents will be available.

Your examples include searching when name changes are involved. Is this really a big issue for most researchers? Is it a 1% problem, a 5% problem, a 20% problem and so forth? Can it be solved by allowing an event to have more than one place hierarchy, maybe the historical one that existed at the time of the event, and the modern one as it exists today? We have an analog with person names, where we can record many different name forms for one person, which is a boon for searching.

No answers here.

Tom

AdrianB38 2012-06-02T12:57:22-07:00

Tom suggests I definitely go further than he would. Maybe - though since I'm not going anywhere at the moment, let's just call them aspirations.

Re the links between Crewe, Monks Coppenhall and Church Coppenhall. I'm firmly of the belief that I'd have to put that stuff in myself, for several different reasons:
1. As I've said before, I have no faith in the idea of a Place Authority creating data at the level of my places.
2. Even if they did, would their concept of a place match mine? Probably not as I've fudged various things, especially where I'm not sure which flavour of a place it is (as before, if someone says they were born in Barthomley, is that the village, parish or township of that name?)
3. So I'd expect to put that data in myself, for just the places I'm interested in.

Name changes - well, that's the problem, I can't really be sure on this. Anyone with ancestors in my home town will probably have the exact same problem as me. But just a few miles north is Winsford, another industrial town, with an almost exactly similar history of coming from 2 settlements, Over and Wharton. I suspect I could probably work through quite a few places where the 18th and 19th centuries created a green-field industrial site with settlement - as I'm sure happened across the globe - but in England, there tends to be an older settlement somewhere in place already.

Strictly we need to distinguish the name change (New Amsterdam / New York) from the "merge" (Over and Wharton becoming subsumed into the new Winsford). And neither strictly match the version where the higher level entries in the hierarchy change - e.g. Harper's Ferry going from Virginia to West Virginia.

But if I suspect that if my database had 3 places (Over, Wharton and Winsford), and the 3 were marked up as being equivalent, then the software could turn a search on Winsford into a search on Over, Wharton or Winsford. I think these are definitely 3 places - certainly Over and Wharton in the 1700s were 2 different places - it's only Winsford in the later 1800s that draws them together. Ditto the 2 Coppenhalls and Crewe. Whereas, Harper's Ferry, New Amsterdam / New York, Widnes are 1 place each with either different names or different hierarchies.

I'm trying to get to 80% of the functionality with 20% of the effort, so would be open to further ideas. And most of that would be to deal with searching on the alternates.

Alex-Anders 2012-06-02T15:32:01-07:00

An further example to Melbourne.

Jimbour Station was a property in Queensland. It was further recorded as Jimbour Station, County of Aubigny, Queensland. Later it was Jimbour Station, Parish of Maida Hill, County of Aubigny, Queensland and eventually Jimbour Station, Parish of Maida Hill, County of Aubigny, Queensland, Australia.
Parts of Jimbour Station were sold off and one section became a settlement known as Maida Hill, Queensland, later to be a town Maida Hill, Parish of Maida Hill, County of Aubigny, Queensland.
Maida Hill was also a suburb within Brisbane, Queensland and the Colony changed the Town Name to Bell. Later again, the suburb ceased to exist as it was encompassed by another (name eludes me).
As no Maida Hill existed, a new town (nowhere near either of the others) was named Maida Hill and exists today. A second location has also been named Maida Hill within Queensland.So an association would need to be established for some names but not others?

AdrianB38 2012-05-30T12:57:25-07:00

Tony - I have to confess that my opinions are based on / prejudiced by my belief that a meaningful Place Authority is a non-starter for the UK at least. (I can have no real opinion on the others). Certainly the current top administrative levels are do-able because they seem to be defined in various places. Let's assume that we can get back in time to some meaningful historic values - though even there I can see arguments over places like Bristol, which was a county in some sense (but not in others) for centuries but is seldom treated as such in genealogy except by Bristolians. But listing all the various towns and villages in each county (or whatever)? From what sources? The situation is even worse in Scotland where whole settlements have disappeared from the map and were never more than a few buildings around a farm. (I believe historians can draw a distinction between societies with villages and those without).

This doesn't mean the Place Authority effort isn't valuable at the higher levels.

But I still feel that the flexibility I want to see goes right up to those higher levels. I _might_ want to create a military geography - e.g. while 11 Group of the RAF in WW2 is an organisation, it's also a geographic area. Similarly, the subdivisions of British Rail were organisational but also had geographic meaning - and that hierarchy is quite different from today's hierarchy in Network Rail. As I said above, I'm not claiming these are necessarily the world's most sensible way of recording these things, but in GEDCOM I could do it. With controlled vocabularies for place-types (never mind Place Authorities), I don't see how I could, without there being an all embracing hierarchy of business-level1, business-level2, etc, and user defined "natures" as you suggest, all the way up the tree. Which seems to make it pointless.

It's rather like event types - we could come up with a core list of event types or place types, but then users need to extend it individually in a meaningful fashion.

WesleyJohnston 2012-05-30T19:17:28-07:00

I have long since given up trying to assign historically correct place names. What I really want is some single way of identifying a place and not carrying around all the baggage of whether it was Canada West or Upper Canada or Ontario.

Even after the names changed, the actual use in the records often still was the old - then-obsolete - way, so that if you are being literally accurate about recording the place as it was written in the record then you would be historically inaccurate about the fact that it was or was not Upper Canada instead of Canada West.

So as I said, I have simply opted to come up with a single time-independent hierarchy that is wrong for a lot of time periods but which if I really want to know what the correct form was then I can do that separately. When I am entering a place for an event, all I really care about is that it was at Whitby or Manchester. If I have a disambiguation problem, then I need to know a bit more about the higher levels, but only to distinguish which place I am talking about and not what the higher levels were called at any give date.

So the entire notion of a time-correct place hierarchy is something that I have simply rejected as being counter-proudctive. Any time that I want to know what the higher levels were for that place (which is really not very often), then I can easily look that up.

Even for towns whose names have changed, I have simply chosen a single representation: English Corners became Columbus, and I choose to use Columbus for all references in my database - again simply to say where it happened, so that that fact is known ... any broader statement of the location is unnecessary in most cases.

WesleyJohnston 2012-05-30T19:20:10-07:00

"So the entire notion of a time-correct place hierarchy is something that I have simply rejected as being counter-proudctive."

Don't see a way to edit this, but what I meant was

"So the entire notion of a time-correct place hierarchy within the event specification is something that I have simply rejected as being counter-proudctive."

Alex-Anders 2012-05-30T19:30:03-07:00

What will you do to your data if/when a current place is renamed? Global replace and use new, keep current not use new????

ttwetmore 2012-05-30T19:56:19-07:00

Wesley,

I wouldn't say I agree with you 100%, but you have sure hit the simplicity and flexibility nail on the head. Whatever we agree on it must be able to support exactly how you want to record your places. But it must also be able to support researchers who want to use historically accurate place names. Of course I taut the DeadEnds approach for places that I gave some very recent examples of. It can handle all of these cases simply and easily.

Tom

ttwetmore 2012-05-30T20:02:15-07:00

WesleyJohnston 2012-05-30T20:12:08-07:00

"What will you do to your data if/when a current place is renamed? Global replace and use new, keep current not use new????"

Change them if you like. Leave them as they are if you like. I don't see a worrisome issue here.

Precisely ... Bohemia was separate, then became part of Austria, was in Czecholsovakia, was virtually annexed by the Third Reich, is now the Czech Republic. But I have opted to use Czech Republic and would stick with that if it ever changed again. If I did opt to change it, I would probably simply go back to using Bohemia.

ttwetmore 2012-05-30T20:15:35-07:00

Tony said,

Regarding user-defined tags, here's a line of thinking I'm currently in the middle of...

STEMMA tries to define a "controlled vocabulary" for the Place types - all the way from country down to building. This effectively means a closed set of terms. Although I don't yet have a sufficiently complete set of terms, the reasoning was in order to support a Place Authority. Such an authority - especially if federated as recommended on my Web page - must have a controlled vocabulary in order for the parts to work as a single resource. That vocabulary, in turn, must be a super set of ISO 3166-2 and the European NUTS which are only relevant to present-day entities.

However, I'm aware that there is still a need for user-defined terms. Rather than in the elements of a geographical/administrative Place Hierarchy, I think these might be for the "nature" of the Place, e.g. school, household, hospital, cemetery, church, etc.

What do you think?

I think you are on the right track. I don't agree that the vocabulary must be a superset of ISO & NUTS, but it should contain a sufficiently complete set of terms. Note that we can put in some pretty loosey goosey terms like levelOne, levelTwo, and so on, with guidelines on the area and population criteria to apply when using them. I assume we would support localization to multiple languages.

I don't think it is that hard to create that set of terms. And realize, from my examples, and others things I have written, that I don't believe it is mandatory, nor even highly recommended, to use these terms when writing place names. In the vast majority of the cases, a place authority could take a look at exactly what we have chose to use for places, without us supplying the terms, and figure out exactly what places we mean.

Tom

ACProctor 2012-05-31T01:10:13-07:00

Re: "I have long since given up trying to assign historically correct place names. What I really want is some single way of identifying a place and not carrying around all the baggage of whether it was Canada West or Upper Canada or Ontario"

Time-dependent hierarchies are working for me Wesley (converting my data to STEMMA as part of my research).

Once a Place entity has been created, it very easy to reference it from elsewhere in the data. I have started adding historical narrative to those Places, and images, and time-dependent hierarchies that aren't relevant (yet) to my data, in order to flesh-out the Place. This doesn't affect my usage of the Place entity though.

Tony

WesleyJohnston 2012-05-31T08:23:39-07:00

The problem for me is that I am using Ancestry.com's online trees as my master copies. A few years ago, I came close to finding out the second date on my own grave stone and realized that it was more important to share what I have than to keep it on my own computer. And I decided on Ancestry as the place to share it, since they claim that your tree will survive your membership. That means living within the significant limitations of Ancestry's online tree software, and they are really weak on places -- even their Family Tree Maker does a better job of place handling. But that's the tradeoff that I have made. If you see some way for me to still tap STEMMA, I would be interested.

NeilJohnParker 2012-05-31T09:02:09-07:00

Gentlemen, I believe we have reached (passed) the point of diminishing returns on this issue. It is obvious to me, and I hope others, that the standard must support Place Names "as found" on a citation or as otherwise determined by a user and should also give those users who whish to have their Place Names edited against one or more Hierarchial Temporal Place Name Structures the option to do so. Furthermore, this dual option rule must be generalized as it applies to several other fields, e.g. Dates and Personal Names.

Alex-Anders 2012-05-31T13:46:58-07:00

I like the concept of allowing any variation, as Neil has said. I also like the idea of having some form of link to other known names of the same place, and then being able to search on any and display them all.

ttwetmore 2012-05-29T10:34:38-07:00

Adrian,

Thanks for your comments.

Do you think it would be possible to invent tags (e.g., like I'm using city, county, state, country), to account for all the types that can get related together in place hierarchies? If we could then everything can be specified via the <type> approach. I'm worried that there might be too many things to have types for, especially when taking historical regions into account.

Start Aside.
In the U.S., counties have a fascinating history, as many (most?) have split over time as western settlement evolved. Another interesting case in my research occurred in Nova Scotia (then a colony of England, not a province of Canada) after the exile of the American revolutionary war "loyalists" to the Saint John River region of Nova Scotia. Within a year the exiles petitioned to carve a new colony out of Nova Scotia, which occurred, giving rise to the colony of New Brunswick. When my loyalist ancestors were exiled they were exiled to the colony of Nova Scotia, but after a year they were living, without moving, in the colony of New Brunswick. The typical way of handling this kind of situation is to use the place hierarchy as it exists today, not how it existed then, though that can sure leave a lousy feeling in your stomach. And of course, when all of these exiled ancestors were born, the United States didn't exist. Those that were born in New York were born in the colony of New York. Do you worry about this in your database? I sometimes do, sometimes don't. As I said my goals are simplicity and flexibility; note that this does not include consistency!! Question: did the United States come into being on July 4, 1776, when the declaration of independence was signed, or was it on September 3, 1783 as the result of the signing of the treaty of Paris?
End Aside.

Of course, the simplicity side of me has to ask, why does it matter that we know the exact kind of a hierarchies we are dealing with? Don't we usually just want something that makes sense when we show a place on a user interface screen or print it into a report? As long as there are no extra commas and it makes sense, isn't that enough?

I can imagine you would like to know more details if you were doing some kind of demographic statistics with the information, but how important a goal is that for genealogical software?

Tom

ACProctor 2012-05-29T12:14:14-07:00

Re: "Do you think it would be possible to invent tags (e.g., like I'm using city, county, state, country)"

It would be easier to handle using a controlled vocabulary for Place-types rather than specific tags Tom. The reason is that there are a lot of possible terms, but there is no consistency across the world - either in the terms being used, or their relative ordering. For instance, these are only a few of the more common terms for administrative divisions within a national boundary:

Authorities, Boroughs, Counties, Departments, Dependencies, Districts, Islands, Municipalities, Parish (Civil), Provinces, Regions, Republics, States, Territories

Even this doesn't include things like sovereign state, or crown dependencies, both of which are relevant to the UK (which isn't a country) and its Channel Islands (which aren't technically part of the UK).

Attempting to give them all a distinct tag name ties the model too closely to peculiarities of national administration. It might also limit the types of hierarchy, or the height and depth of the hierarchy.

Both the ISO 3166-2 and the European NUTS standard only define the present-day national subdivisions so things like "Shires" are not considered.

Tony

ACProctor 2012-05-29T12:17:13-07:00

...Apologies if I misunderstood Tom - I assumed your reference to "tags" is for record tags rather than the value of a <Type> element.

Tony

ttwetmore 2012-05-29T15:25:50-07:00

Tony,

I think you did misunderstand, but you've rectified the situation.

But your question does prompt me again, wearing my simplicity and flexibility hat, to ask whether or not the type "words" (e.g., city, county, state, province, ocean, country, ...) have to come from a fixed set or not. Why couldn't they be anything a user desired? I assume common sense would prevail and there would be a very standard and obvious set, and software would provide all the obvious ones in a user interface widget, but why not let the user use any word or no word? Where is the value in prescribing a vocabulary for the types of geographical and political "units"? I came up with statistical analysis, finding out how many people in your database come from each country, say, but that's about it. And since in most cases you will have this information in the data, there's no problem. There will always be outliers in your data where you don't have enough info to let them be part of this kind a analyses.

I hope you are not getting fed up with my constant desire to question all assumptions that creep in in the guise of rules and so forth. For example, every once in awhile someone jumps on the latitude and longitude wagon, expressing the belief that lati-longs are the ultimate solution to the place problem -- if you can tie a place to a specific spot on the earth everything else is just gravy. I don't have a single lati-long in my database and am not missing a thing.

Where it is obvious, and where you really care to do so, I think it is great for a researcher to specify the details on places (by using those type words or some other way we settle on). What I want to warn against, however, is that discovering a good way to do great things, should never be used as a justification for enforcing that great way of doing things with a rule. Genealogical data does not do well in the face of the kinds of restrictive formatting rules you find in typical database schemas. Just consider GEDCOM.

I want to be able to say "New London" as a place when I don't know what it is or where it is, without any penalty for not knowing. When I find out what it is and where it is, then I want to be able to add that info. If New London is a place associated with someone way out on the periphery of the persons I am interested in, I'm not likely to care whether I ever resolve "New London" any further.

Tom

WesleyJohnston 2012-05-30T03:49:54-07:00

All of the examples thus far given use place only down to the level of a city. Keep in mind that we also have Data-Place06 "Location to include address". So we have to be thinking of locations as going deeper than city.

NeilJohnParker 2012-05-30T05:29:26-07:00

I am sure there are numerous uses for Tag information, how important any of them are is open for discussion. The answer being probably not that important.

What is most important in my mind is that the field that you are recording is at least correct for the date that you are using. I have often recorded a field that was incorrect, being notified by the system (in this case Legacy Deluxe that it was invalid for the date and after further checking found the correct value. This feature is the main strength of the editing against a temporal place hierarchy for me.

Neil Parker

NeilJohnParker 2012-05-30T05:31:39-07:00

Two points:
Place Name is a temporal attribute, and the hierarchy must allow a date range to be associated with each place name; (each time a date is looked up in the hierarchy, the date must accompany it)

There can be several hierarchies, i.e. geographic, government, religious - Roman Catholic, Religious - Anglican etc .

Must have ability to select different hierarchies.

Temporal Place hierarchies may be available in several public or private authorities on the internet , if so should have ability to access them.

Place name can have a place name type which may be several alternatives depending on which level you are at i.e. in Ontario Canada it is County, District or Region for level 3. This can all be accommodated in the place Hierarchy but most of the time it is never used.

Also if the Hierarchy is not governmental should it be explicitly stated.

Can the user add on to the hierarch. Usually hierarchies only go to the official level , for Governments in North America lets say:
Country
Province or State
County, District or Region
City, Town, Village, Hamlet, Unorganized Territory, etc

Should the user be allowed to add additional levels such as community Name

Neil Parker

AdrianB38 2012-05-30T07:09:31-07:00

Tom asked "Do you think it would be possible to invent tags ... to account for all the types that can get related together in place hierarchies?"
In practice, I believe not. It might well be possible to obtain the current hierarchies for governmental administration, but (a) historic hierarchies are more likely to be a problem and (b) the types of hierarchy are probably infinite.

To take (b) - I might very well have described someone's occupation as "motive power superintendent", organisation = "LMS Railway", place = "North Wales Division". I'm not saying I would, nor that it's the only way, never mind the best, but the point is that I might have done something like that and so this data would need to be converted from my GEDCOM-style business-division-places to BG. Chances of anyone thinking up all such hierarchies are slim and therefore having only a controlled vocabulary seems implausible.

To take (a) - similar issues apply. If I'm lucky enough to trace my ancestry back to Anglo-Saxon times, are we sure that we could define all the Anglo-Saxon place-type hierarchies? (And every time you say 'yes', I'll just find a more remote country!) Again we have the case that if I am wanting to define a hierarchy unique to the Kingdom of Mercia, say, and it's not in the controlled vocabulary, who's going to validate my request to add it? How many experts in Anglo-Saxon genealogy are there? Will they be in the BG administration?

Seems to me that while we might want someone to provide "best practice" hierarchies (e.g. building-address, suburb, settlement, county, country), we're on a loser if we expect those to be the only choices.

And while Neil and Tom are right to highlight the temporal element, it's also right to note that UK family history at least is consistently inconsistent in whether or not it expects to use a contemporary name - we have a tendency to stick to the pre-1974 county names even for post-1974 events. To some extent this is possibly because the software doesn't search well across time - "Manchester, Lancashire, England" is different from "Manchester, Greater Manchester, England" in many systems, whereas if there were an underlying entity of "Manchester" with timed-hierarchies and timed-names, then there'd be less of an issue in searching for events in Manchester. But even then, I suspect that "Manchester, Greater Manchester, England" is such an unlovely name that most of us would prefer the old shire county version, regardless of any authority. I think Tom at least would also prefer to choose his own names...

AdrianB38 2012-05-30T07:19:58-07:00

I think it may well be useful to remind ourselves why we might want hierarchies, particularly changing ones.

I suggest it's
(a) to encourage best-practice designation of locations;
(b) to enable locations that match across people's databases;
(c) to record what happened as a matter of local interest - e.g. "lived in 3 different countries and never moved a mile";

Just because we can do complex stuff, doesn't mean we should. Which I think is what Tom suggests.

ttwetmore 2012-05-30T09:23:51-07:00

I have no problems allowing great complexity in the places area. Adding temporal information to place names makes sense to me. Having official hierarchies available to check against makes sense. Having software do additional sanity checks using places and dates also makes sense to me. Having software suggest corrections to places makes great sense, and I use that feature in existing software. Having multiple hierarchies categorized into different types makes sense.

But (and there is always a but) I don't believe in requiring that complexity. If I want to enter a location of "New London, Connecticut", with no additional information about what the entities are, or what historical time frame is involved, I insist on being able to do it.

Genealogical data is inherently messy and ambiguous and confusing and unclear and erroneous with room for conjectures and speculation. The places area must be designed to handle this kind of data. The world of genealogical data is simply not arranged in neat little bundles of cities, counties, states and countries. Maybe you had a relative who died "out west". I'd want to record that as <place> out west </place> with no rule-bound software monitoring me. I'd be happy if this showed up on a warning log, because I would like to pin it down a bit better later, but I'd insist that it be allowed.

Tom

AdrianB38 2012-05-30T09:45:19-07:00

Tom - I agree. I think.

I need flexibility when I get references to "Manchester", e.g. Is that Manchester the parish, the township (within the parish), the settlement? It may not be clear. But what is clear is that it's "Manchester, Lancashire, England" since all 3 concepts sit inside the county in question. That hierarchy either doesn't exist or is "Unknown, county, country".

I don't _think_ anyone here is seriously advocating validating such stuff out but it does need a bit of thought as it may be the rule not the exception for many similar places.

(Don't get me started on what "London" means...)

ACProctor 2012-05-30T10:51:55-07:00

Regarding user-defined tags, here's a line of thinking I'm currently in the middle of...

STEMMA tries to define a "controlled vocabulary" for the Place types - all the way from country down to building. This effectively means a closed set of terms. Although I don't yet have a sufficiently complete set of terms, the reasoning was in order to support a Place Authority. Such an authority - especially if federated as recommended on my Web page - must have a controlled vocabulary in order for the parts to work as a single resource. That vocabulary, in turn, must be a super set of ISO 3166-2 and the European NUTS which are only relevant to present-day entities.

However, I'm aware that there is still a need for user-defined terms. Rather than in the elements of a geographical/administrative Place Hierarchy, I think these might be for the "nature" of the Place, e.g. school, household, hospital, cemetery, church, etc.

What do you think?

Tony

gthorud 2011-11-15T17:33:57-08:00

Ages ago I proposed 5 definitoins related to places. See under P in this list

http://bettergedcom.wikispaces.com/Glossary+Of+Terms

In some cases the authority is implied by what I have called the place type.

I think there are cases where you would want to know the authority, for example when a place is assigned an identifier (a type of name) in some identification system - the id may be needed since names are often not unique within the next higher level. It may be useful to record the authority even if you do not reflect it in the hierarchy, e.g. for looking up a name/id in a database operated by the authority.

Legacy does not cover the whole world at all times - far from it. The user must be able to tailor this to his own needs.

AdrianB38 2011-11-16T08:03:42-08:00

Rules for hierarchies are fine where they fit, but we need to design for both rule-based hierarchies and the sort of ad hoc muddle that the UK has. Some issues are:
- it may not be clear just which place-definition is referred to by a name in a source. For instance, when I see a residence of Barthomley, is that Barthomley the parish, Barthomley the township (a sub-division of the parish) or Barthomley the village? All 3 are centred on the church of St. Bertoline but it's seldom clear which is referred to;
- certain settlements overlap boundaries - in the UK, London is split between the historic counties of (at least) Middlesex and Surrey, depending on which side of the Thames you are. Hence I usually just write "London, England" even though I normally ensure that a county is 2nd element. (Similarly, does Kansas City regard itself as 1 city or 2?)
- it is usually easy to acquire current political hierarchies for some countries. But they may not be appropriate - to avoid changing hierarchies, UK genealogists usually use the "historical county" as the 2nd node of 3. The definitions of these is also not always clear, particularly as some cities were, for local government purposes, split out of their surrounding county many yeas ago - much earlier than simplistic traditionalists acknowledge.
- UK genealogists tend also to mix hierarchies - for instance, the "Coppenhall parish" that I know is usually qualified as "Coppenhall parish, Cheshire, England" - which is an ecclesiastical (CofE) / political / political hierarchy as the wholly ecclesiastical one would read "Coppenhall parish, Diocese of Chester, England(?)", which is less help, especially when referring to Dioceses whose geography is not clear - e.g. the Diocese of Lichfield.

So yes, take advantage of rules where you can, but allow the rest of us to fiddle and tweak as we always have!

ACProctor 2011-11-27T09:28:30-08:00

Another way of looking at this, which I feel might be a bit simpler and easier to manage, is to keep place name hierarchies as pure hierarchies. If a place changes from one parent to another then a separate hierarchy can be created and a weak link added between the leaf elements to indicate that they're effectively the same place.

Advantages include the ability to reference a Place directly since it only has one hierarchy above it. Applicable dates could be associated with the two variations of a place.

I know it's just as esay to split a Place definition to say, for instance, 'between x and y it is in this hierarchy, and between x2 and y2 it is in this one'. However, if a building changes its usage (e.g. a school becoming a factory) then the idea of an equivalence link works very well, even though no hierarchy has changed.

There's really not a lot of difference between the two approaches. I'm looking for a good argument to favour one over the other. Any suggestions? What about handling name changes, as opposed to variations of spelling?

testuser42 2011-11-27T14:54:35-08:00

Hi all,
I've posted this a long time ago already, but it might be relevant here.

There is a great resource for places in Germany and Middle Europe called "GOV", translated as "The Historic Gazeteer". Most pages are in German only, but Google translator might be doing an OK job:
Overview:
http://wiki-de.genealogy.net/GOV
"As of Feb 2011, there are about 355000 objects in the database" (see the red dots on the map).
"GOV collects historic and current information about places, churches, regional structures, political and church affiliation, statistical information etc."
Use the search to see for yourself:
http://gov.genealogy.net/search
You'll get a list of results, and then a page for the object. Here's the city Straßburg/Strasbourg
http://gov.genealogy.net/item/show/STRURGJN38VN

The data model is straightforward, as far as I understand it:
http://wiki-de.genealogy.net/GOV/Datenmodell
A GOV-Object has many properties
http://wiki-de.genealogy.net/GOV/Quicktext#Eigenschaften_von_Objekten
is called x (with date range and language code)
is situated at position x (coordinates)
is object type x (date range)
has x inhabitants (d.r.)
has postcode x (d.r.)
has w-Number x (thats a special kind of postcode)
has external ID System:ID
has confession x
has URL x

...and relationships to other objects
http://wiki-de.genealogy.net/GOV/Quicktext#Beziehungen_zwischen_Objekten
belongs to y
is situated in y
represents y (e.g. a church building represents the parish)

Both properties and relationships have the possibility of a date-range and a source.
Object types are numerous:
http://wiki-de.genealogy.net/GOV/Objekttypen

To allow the transfer of such place hierarchies, a group of developers agreed on an extension to GEDCOM.
http://wiki-de.genealogy.net/Gedcom_5.5EL
http://wiki-en.genealogy.net/Gedcom_5.5EL

It seems that while they were at it, they also started to clarify and extend other GEDCOM tags, like MARR and NAME. This might have been the start of the ongoing effort in the "GEDCOM-L" mailing list.
The "GOV" has its own list, it seems:
http://list.genealogy.net/mm/listinfo/gov-develop

WesleyJohnston 2011-11-27T18:22:54-08:00

It certainly does give an idea of the way in which even a small village can be in multiple very different hierarchies. I plugged in SPANTEKOW into http://gov.genealogy.net/search and the four hierarchies that result radically different, even though they are all for the same village (there is only one Spantekow in Germany) -- not just in content but in complexity of the hierarchy.

AdrianB38 2011-11-28T09:35:48-08:00

"the way in which even a small village can be in multiple very different hierarchies"

I'm not sure that's the way I'd describe it. The way I read the first screen, there are 4 _different_ objects called Spantekow. Clearly all have the same name, all derive that name from the same village, but they are different objects:
- a Gutsbezirk (oh joy, Google Translate doesn't help, but it's defunct)
- an Amt, which seems to be an administrative division - also now defunct,
- a village (which appears to be in the municipality of the same name)
- a municipality

That's a bit of an over-simplification - the "village", for instance, is actually referred to as a village, Ortschaft or Ortsteil at various times, but seems to be the same object throughout.

So this, in my view (fools rush in, etc...), seems to represent 4 different objects with the same name, not 1 object in 4 different hierarchies.

Having said that, the municipality of Spantekow, when I look at the "Superordinate objects" diagram, does seem to have a pair of current "owners", i.e. it's in 2 hierarchies, viz: it's currently owned by both the Amt of Anklam-Land and the Landkreis (rural county?) of Vorpommern-Greifswald. Though since Anklam-Land comes under Vorpommern-Greifswald itself, I'm not sure why the direct relationship is there.

So in summary, I think we do see a place under 2 different hierarchies, but we also need not to confuse ourselves by equating different objects of the same name (in the same geographic area). That's a different topic.

ACProctor 2011-11-28T09:49:22-08:00

I struggled for a while on this subject, and I'm still not certain I "have it".

Geographical hierarchies are much easier to handle - even though they're not always unique. Names change, multiple spellings, boundaries moving, etc., can be handled fairly cleanly.

However, administrative, council, parish, electoral wards, etc., feel independent to me. Rather than creating hierarchies of different flavours, I decided to just keep the geographic one(s) and represent the other details as properties, or associated PFACT items :-)

AdrianB38 2011-11-28T12:12:18-08:00

I think in practice, Tony, you're close to practical reality.

Firstly, let me reiterate that I believe places _can_ be in multiple hierarchies at once - e.g. in the late 1800s, Haslington was in:
- the Church of England parish of Haslington;
- the geographic county of Cheshire;
- the parliamentary constituency of Crewe;
- the Poor Law Union of Nantwich
and no doubt others beside.

However (1) - How many of those place hierarchies are relevant to family history? Sure, we may mention the Poor Law Union and the parliamentary constituency, and we certainly need to understand where we might find Haslington's records filed for parliamentary elections or Poor Law purposes - but do we need a database structured to hold all that lot, or are the simple notes on the GENUKI web-site sufficient?

What we _do_ surely need is the hierarchy that locates the place on the map because we might want to query by village (Haslington) or county (Cheshire). And I'm also quite sure the ecclesiastical hierarchy will be similarly useful because so many of the records come from ecclesiastical sources that querying by parish is important.

However (2) - I wonder if some of these place-place hierarchies are not actually place-organisation hierarchies - or can be, in practice, treated as such. For instance, while Cheshire County Council can be represented on a map, and therefore has a nature as a place, nonetheless its only role in my files is as my first employer, and it is therefore an organisation in my view. Wilmslow, the place where I worked, was administered by CCC, so therefore I could happily record an independent place (Wilmslow) to organisation (CCC) hierarchy, much as Tony suggests.

I reckon that between ignoring the hierarchies of no direct relevance to the genealogy of people and turning other hierarchies into place-organisation ones, we probably - at least in the UK - slash the number of multiple hierarchies drastically.

The question then becomes whether the number of multiple place-place hierarchies across the world is such that it must be represented in BG or not?

ACProctor 2011-11-28T13:31:44-08:00

I think that from a date point of view, the answer is probably 'yes'. In other words, different hierarchies could be valid over different dates.

Would you treat that as subdivisions of a Place, or distinct Places with a soft link between them?

ttwetmore 2012-05-29T08:42:28-07:00

A place is primarily an attribute attached to an event or other attribute. In this context a place should be able to appear in two main contexts (a combined context to be explored also below).

First, the place can be encoded in situ, that is, self contained, with no external links, for example:

<person...>
  <birth>
    <place>
      <name> New London, New London, Connecticut, United States </place>
      <type> city, county, state, country </type>
    </place>
    ....
  </birth>
  ...
</person>

Because a place is an attribute, it can have any of the sub-attributes that any attribute can have. Here I decided to use the <name> attribute for the place’s name and <type> for the types of the name’s parts. No implication that <type> is required.

Second, the place can be encoded as a reference to a “first-class” place object/record, for example:

...
  <birth>
    <placeref id=”p12345”/>
  </birth>
...
<place id=”p12345”>
  <name>New London, New London, Connecticut, United States</name>
  <type> city, county, state, country </type>
</place>

In this case there is a single place record for the city of New London which contains its full hierarchy up to the country level. But this can be expanded to using a full hierarchical approach, for example:

<place id=”12345”>
  <name> New London </name>
  <type> city </type>
  <placeref id=”p12346”/>
</place>
...
<place id=”p12346”>
  <name> New London </name>
  <type> county </type>
  <placeref id=”p12347”>
</place>
...
<place id=”p12347”>
  <name> Connecticut </name>
  <type> state </type>
  <placeref id=”12348”/>
</place>
...
<place id=”p12348>
  <name> United States <name>
  <type> country </type>
</place>

And hybrid approaches work just as well:

...
  <birth>
    <place>
      <name> New London, New London </name>
      <type> city, county </type>
      <placeref id=”p12349”/>
    </place>
    ...
  </birth>
...
<place id=12349”>
  <name> Connecticut, United States </name>
  <type> state, country </type>
<place>

These examples show my preferred model for places. It allows parts to be fully combined or fully separated into hieararchies or any combination in between. It allows parts to be encoded entirely as a simple attribute or entirely as as place objects/records or any combination in between.

Note that postal addresses can composed of additional place parts if so desired. The model allows latitudes and longitudes to be added as properties at any level in the hierarchy.

Also note that this model allows multiple hierarchies, as in any context where a <placeref> can occur, multiple <placeref>’s can occur. So this model can handle pure geographical containment, pure political containment, containment based on historical boundaries, and any multiple combinations thereof.

And it does all these things while being itself nearly trivial in structure, definition and implementation. Note that the <placeref> elements could be replaced by <place> elements to simplify it even further.

This is the DeadEnds model of places. My top four design princliples are simplicity, flexibility, simplicity and flexibility, in no particular order.

Tom

ps. This also works with external place authorities. DeadEnds assumes that all record level objects have unique IDs. Therefore any place authority that provides hierarchies of place records with unique IDs assigned by the third party authority organization integrates seamlessly. Therefore the id in a placeref (e.g., <placeref id=”xxxxxxxxxxx”/> would point to a place object maintained by that authority.

AdrianB38 2012-05-29T10:12:17-07:00

"note that this model allows multiple hierarchies, as in any context where a <placeref> can occur, multiple <placeref>’s can occur."
In this case, you need to "type" the hierarchy _somewhere_ to show that this is the geographic higher place or this is the ecclesiastical higher place. Would that be best done at the 'lower' end or the 'higher'?

As it has to be present at the higher end anyway to show what sort of place it is (I assume that if you are having multiple hierarchies then you would indeed type all your places otherwise life would get too confusing), then it would seem sensible to "type" the hierarchy by saying - look at the 'higher' end to find out what sort of hierarchy it is. But that does imply that if we have X related to Y, then there is only one way that X can be related to Y, and that way is determined by the types of X and Y. I am slightly worried that this may not be so - that German site had some strange (to my eyes) relationships - was it true that X could be related to Y in two different ways? Or was it that X to Y existed at the same time as X to Z to Y, which was for a different purpose????? Dunno...

AdrianB38 2012-05-29T10:21:01-07:00

Flipping heck - why don't I read my own posts! The example is right above:

"the municipality of Spantekow, when I look at the "Superordinate objects" diagram, does seem to have a pair of current "owners", i.e. it's in 2 hierarchies, viz: it's currently owned by both the Amt of Anklam-Land and the Landkreis (rural county?) of Vorpommern-Greifswald. Though since Anklam-Land comes under Vorpommern-Greifswald itself, I'm not sure why the direct relationship is there."

Now I read it, this doesn't seem an issue after all. X is related to Y. X is also related to Z. Y and Z are different types, so that can be used to define what the type of relationship is that X is in. Y is also related to Z but this doesn't seem to be a problem.

So, we still have the query - if we have X related to Y, then is it true that there is only one way that X can be related to Y, and that way is determined by the types of X and Y?

(Bear in mind that name=Nantwich, type=Rural-District is not the same place as name=Nantwich, type=CofEParish)

ttwetmore 2011-11-15T09:12:05-08:00

I agree with Wesley on this point. The Place hierarchy is in actuality a directed acyclic graph and not a simple tree. Implementation is still relatively simple. Each place can refer to one or more places that enclose it, and simple graph algorithms can assure the DAG property after each new Place is added to the hierarchy. I was going to bring up this point in my recent post on the Place thread but thought it would add more confusion than not.

gthorud 2011-11-15T15:39:10-08:00

So a place record can have several parents. And there can be several paths from the top to a record.

Then you also need to be able to record a path through the graph, i.e. the path that a is specified in e.g. a source for an event. You can not simply refer to a place record in an event.

NeilJohnParker 2011-11-15T15:59:52-08:00

There may be some additional properties that a place hierarchy has. For example, does a place hierarchy require or imply that some authority has resposibiity for determining what the place names are and their boundaries are at any given time and has the ability to change the boundaries at any time. This is certainly true when it comes to traditional country, state (or province, territory...), county (or distict...), municipality (e.g.city, town, village, hamlet, township...) Furthermore usually the boundaries for such subdivision must be wholly contained in its parent and not overlapping with its sibling. Furthermore we are really talking aobut place name/place type duets, i.e New York/City, New York/County, New York/State, US/Country. The jurisdiction (i.e. place hierarchy should contain place type in its records but it is usually not explicitly shown in listings. I suggest that each jurisdiction be it political, administrative (e.g. utility Co., Census...), religious etc must be defined along with the authority for that jurisdiction i.e. Church of England parish place hierarchy. Also I believe that this is a multiple hieracial data structure (i.e. one distict hierarchial structure for each jurisdiction, not a network structure. A jurisdiction use of a given place may or may not be coterminau with those used by another jurisdiction weven though both use the same name.

NeilJohnParker@Telus.net

NeilJohnParker 2011-11-15T16:28:38-08:00

Follow on:
If place hierachy is to be repreesented as multiple place hierachies, then this place hierachy must be cabable of being created and maintained by a user and/or some other jurisdiction, preferably the juridsdiciton that owns it, e.g. Church of England.

It is assummed that goood software package would at least contain the default jurisdiction political i.e. Country, State, County, and municipality for the world with its temporal data (as does Millenia's Legacy Deluxe Edition).

WesleyJohnston 2011-11-15T17:06:52-08:00

NielJohnParker wrote " ... as does Millenia's Legacy Deluxe Edition"

I'm glad you mentioned that. I have been thinking about it a lot. Millenia/Legacy has done a great service by making that information about locations available. If I remember correctly from a 2009 presentation, Paul Rasmussen has also been supporting a software program that maps addresses within some cities, so that you can see where they were at different times as the ward boundaries changed.

There is a great need for a standardized historical database of political hierarchies, from the address level on up. And Millenia/Legacy has made a significant stride in that direction.

AdrianB38 2012-05-22T04:11:49-07:00

Syntax05 - User Extensibility of events and characteristics

Description:
The list of events, properties, characteristics, etc, of individuals, etc, in the BetterGEDCOM file format must be capable of extension by users. Extensions must be kept permanently separate from any later definitions in BetterGEDCOM format.

Importance:
Mandatory

Why?:
1. GEDCOM can be extended so to remove the facility would be a step backwards.
2. Many GEDCOM files exist with user-defined events.
Source:
Original Goal 3

AdrianB38 2012-05-22T12:53:15-07:00

Tony - given that you agree with defining arbitrary events, I'm curious why you wouldn't include Events with Characteristics? Is that because you wouldn't agree with the creation of arbitrary characteristics (e.g. adding MilitaryRankSubstantive, MilitaryRankTemporary, MilitaryRankWarSubstantive, as different variations on a theme) or because you would agree but there are extra aspects to consider?

AdrianB38 2012-05-22T12:56:03-07:00

Note the following post on another thread:
"Syntax04 - Extensibility by software companies
"ttwetmore Today 6:09 pm
"This may be an obvious point, but there are two types of extensibility that can be contrasted.

"First there is the type of extensibility done by inventing new tags, possibly up to an including new record types.

"Then there is the type of extensibility done by attaching TYPE tags with values to a higher level generic tag. For example, new novel events could be handled by placing a TYPE tag that describes the event under the generic EVENT tag.

"It is my opinion that BG should forbid the former and promote the latter.

"Some people think the second approach should actually be the overall approach, that every event should be an EVENT tag with either a TYPE subtag or attribute (in the XML sense). Of course, if the vocabulary of the TYPE values is then highly prescribed, this is no longer an extensible solution. An argument for this position is that it minimizes the number of tags. I don't believe that this is an important goal at this level, but others disagree.

"I personally believe that we should have specific tags for all the important events of genealogy and family history, and then use the EVENT/TAG approach for novel situations.

"Tom"

I believe that post to be relevant to this thread, as it seems to describe how User Extensibility of events could be done in BG and is done in GEDCOM.
Adrian

ACProctor 2012-05-22T13:16:54-07:00

Re "I'm curious why you wouldn't include Events with Characteristics?"

It's just that I think the mechanisms for supporting Event-types and user-defined properties would be very different Adrian. Hence, I was looking at customisation more from the 'appropriate mechanism' point of view. For instance:

Schema - This involves custom entities and tags/record-types. None of us seem to like this provision.
Properties - I think this can be handled by treating properties as name-value-type data that can be defined by the user. Hence, no new tags or record-types would be required.
Events - I think we can have a standard Event entity, with an open-ended set of types/categories, and simply pre-define the main important ones.

In summary, the more of the data synatx that we can pin-down at the start then the less we need to change later. I therefore prefer extensible values as opposed to extending tags/record-types.

Tony

AdrianB38 2012-05-22T14:54:20-07:00

"It's just that I think the mechanisms for supporting Event-types and user-defined properties would be very different"
OK - I can buy that. One way or another, we can see a way forward for both events and characteristics. Though I think I'd also prefer to see a pre-defined list of the more important characteristics. If not, can you imagine the ensuing dialog(ue) "Is it occupation, job or career you want us to use? Can't you even be bothered to tell us?"

Moving to your categories / types for events, I agree with the idea in principle, though would argue with the detail at whatever point it became important so to do. Probably only to be expected.

Firstly, as a minor point I'd prefer to call them something like type and sub-type because the names then convey which is the broader and which the more detailed. (This is an informed prejudice brought on by an inability to remember and explain which was which in BR's locomotive classes, diagrams, types, etc. when prefixes like operating / engineering, etc would have helped everyone understand. Me included.)

Secondly, with rare exceptions such as "Union" and "Dissolution", I'm not convinced that the Categories you list add much to proceedings. E.g. knowing that Probate and Will are both in the Legal category doesn't seem that useful if there's nothing for them to inherit from Legal. Whereas, being able to sub-type Will as Will-Codicil seems more useful. So, I'm agreeing with your principles but would focus differently.

AdrianB38 2012-05-22T14:59:13-07:00

PS
Tony, I like the concept of being able to define units for properties. I still remember being at junior school, saying that the answer was (say) 6 and being asked "6 what? Apples, oranges?" It made an impression on me...

ACProctor 2012-05-23T01:12:34-07:00

Thanks Adrian.

Re: "Moving to your categories / types for events".

I have a problem with this area anyway. A "controlled vocabulary" should be 'closed' (i.e. controlled) so that all accepted possibilities are predefined. From that POV, my categories/types (or types/sub-types) do not constitute a controlled vocabulary.

What I really wanted is a set of items that has a predefined subset, but allows user-defined additions without clashing. I tried to avoid having two distinct data values, i.e. a controlled type, plus some user type for when controlled_type="Other", say.

(I hope that makes sense)

Short of decorating the names, say with an underscore prefix (so that _Typexxx is a user-defined type and would automatically be categorised under the controlled-type of 'Other'), I couldn't think of a clean way of accommodating the two values in a single datum.

Any suggestions?

Tony

nick-mat 2012-05-23T03:48:36-07:00

For my program, I classified events into 7 subtypes, Birth, Near Birth, Death, Near Death, Family Union (Marriage), Other Family, and Other. This is mainly so the program can make certain assumptions. If the program cannot find a birth date, then it can use the earliest Near Birth event. The Same for deaths. The Family events are marked so they can be displayed with other family details.

There may be other ways we want to categorize. I am currently experimenting
with extending the concept of Events to cover all attributes/characteristics, which will mean many more categories of events.

Nick

ttwetmore 2012-05-23T04:24:45-07:00

From an RDF point of view (and others) an event and characteristic are the same kinds of things; in the RDF case they both consist of triples. In the case of a characteristic the triple generally links a "subject" (say the person with the characteristic) via a predicate (the name of the "tag") to an object (the value of the characteristic). In the case of an event the object part of the triple is a more complex thing, a subject in its own right with its own characteristics (the main being type, date and place).

The important idea here is that everything is a characteristic, but we choose to partition the most important ones into their own categories. For instance the name of a person is a characteristic, but most of us believe that a name should be treated with its own subset of special tags that make the name such a special characteristic that we prefer to call it a name instead. But things like height or weight or color of eyes, we don't consider genealogically highly significant, so a simple characteristic tag is good enough, if there is one at all.

Especially when the characteristic value is an object unto its own, with its own characteristics (e.g., name, place) we tend to not think of them as characteristics, which we reserve for "simple" things. Or we can say that we think of characteristics as things with atomic, indivisible values, and non-characteristics as things with structured values.

Extensibility in an RDF world primarily means adding new predicates, and new object (in the RDF sense) types and values.

Personally I think extensibility is the same in both the event and characteristic area, because I think we should generalize the two concepts in defining certain properties of the model. Just as I think there should be an EVENT:TYPE "predicate pair" for extending events, there should be a PFACT:TYPE pair for extending characteristics.

Note that none of this deals with the issue of whether events are first class citizens or not (whether they are "record level" entities). If an event is a high level entity, then from an RDF point of view the event is an anonymous entity (a bit of a misnomer in my view since it must have a unique ID, which is a characteristic that uniquely identifies it) and then it becomes (through the good offices of that unique ID) the object of a subject (say the object BIRTH), which is an object in a person (or in layman's terms, it gets "pointed to" by a non-anonymous subject).

An event only seems different from a characteristic because an event exists at two levels in an RDF type graph -- it is the object part of a triple from the point of view of a subject (e.g., a person [subject] has a birth event [predicate and object]), and it is the subject part of triples that define its own characteristics (e.g., a birth event is a anonymous but uniquely identifiable subject, with a type, date and place [predicates and objects]) And of course the date and place objects in the event triples would be subjects in their own right with their own third level objects.

The same situation exists in a computer data structure with multiple fields. Some fields contain primitive objects (integers, strings) and some fields contain references to other data structures. The same situation exists in a GEDCOM file where any line without sub-lines is a simple object of the parent line (which is the subject and the child line tag is the predicate), whereas the parent line is both an object of the line above it and the subject of the lines below it.

The same situation exists in JSON, XML, etc., yadda, yadda, yadda.

I think it is important to first grok the full semantics and relationships of all the concepts, and then divvy they up into terms that are consistent with one another and go along with the pattern of words we choose to describe them. Seeing how every concept in a potential genealogical data model fits in with the RDF triples model of knowledge is a great organizing principle for doing this. Once the organization and partitioning of triple types is done, the term RDF never need be mentioned again. I see the terms events and characteristics as concepts that we can first unify with an RDF point of view, but then choose to give separate names to as we distinguish pragmatic differences between how the concepts are used and how we believe the concepts should best be placed into a data model.

The very fact that we are comfortable, more or less, with the terms event and characteristic, is an indication that we all recognize that that are useful properties of event-characteristis that make us want to carve them out of the larger universe of all characteristics, and give them a special sub-category and treat them somewhat differently (for example define them as top level entities in a data model).

But, when considering extensibility then the fact that an event is really "derived" from a characteristic can be the principle that dictates that they both be extensible in the same way.

Tom

ACProctor 2012-05-23T04:49:15-07:00

It sounds to me Tom like you might be putting undue emphasis on the relationships rather than the entities themselves.

The definition of a Person, Place, or Event is crucially important, but I feel the entity relationships are a natural consequence of their individual [entity] definitions.

For example, we all accept that a Person may be linked to other Persons for genealogical lineage (i.e. a biological hierarchy), and most of us would accept that every Place has a parent Place (i.e. a Place hierarchy). STEMMAs Events also constitute an Event hierarchy in order to add fine structure to the definition of a non-trivial Event. The point I'm getting at here is that the triples linking one of these entity types to another of the same type is a consequence of their definitions rather than the other way around. [yeah, I might be splitting hairs but it feels right to me]

Also, STEMMA has an inheritance mechanism that involves Events, Resources (i.e. supporting files), and Citations (or "sources" to everyone else). I don't believe RDF has any mechanism that can model this type of inheritance, which in turn is directly analogous to subclass/superclass in OOP.

Tony

ttwetmore 2012-05-23T06:41:41-07:00

Tony,

Thanks. I didn't intend to stress relationships over entities.

In RDF, as I'm sure I you know, a Place being a sub-Place of another Place is a simple RDF triple, (place1465, is-subplace-of, place2638). Of course, there would be other triples like (place1465, has-name, "New London"), (place1465, has-type, city), (place2638, has-name, "Connecticut"), and so on.

For Person links I believe we need to distinguish direct person-to-person links, and linking mediated by relationship objects. Both seem to me to have their proper places in a genealogical data model, and both easily represented in RDF. For example the multi-level inter-persona relationship that I believe is critical to the next generation of genealogical software, in order to fully support the research and evidence process, might have triples like:

(persona2435, is-evidence-for, persona6481)
(persona6481, is-concluded-to-be-an-individual-because-of, "... proof statement ...")

Contrary to your comment in your last paragraph, class membership and class inheritance (i.e., OO concepts) are handled by RDF, as in the following triples:

(person24354, is-an-instance-of, Person)
(Person, is-a-subclass-of, GenealogicalEntity)

In most contexts the "is-an-instance-of" predicate is simplified to "isa" or "is-a".

One could also define schemas using RDF, as in things like:

(is-a-characteristic-of, is-a, Predicate) << These three go all the way down to the "assembly" language level of RDF,
(Name, can-be-a, Subject) << as a Predicate, Subject and Object are the three basis concepts of RDF.
(Name, can-be-a, Object)
...
(Name, is-a-characteristic-of, Person)
(Birth, is-an-event-of, Person)
(Birth, is-a, Event)
(Date, is-an-optional-characteristic-of, Event)

It must be able to, as a schema is a specification of a form of knowledge.

You can use RDF to define your data storage or your external file formats, though this might be nothing more than a theoretical exercise.

(Person, is-represented-as, Person-Relational-Table)
(Person-Relational-Table, has-column, Name-Column)
...

I'm not suggesting we do any of these things with RDF. I'm just pointing out that RDF provides an excellent conceptual framework for casting nearly everything we need to discuss into a uniform vernacular.

Tom

ACProctor 2012-05-23T07:03:22-07:00

OK, I se what you're saying now Tom.

However, how would RDF distinguish the OOP concepts of "has-a" and "is-a"? If one entity is embedded within another, rather than being linked to it, then does that cause a problem for these triples?

Tony

ttwetmore 2012-05-23T08:16:54-07:00

Tony,

To answer you directly:

(Person, is-a, GenealogicalEntity) << inheritance relationship
(Person, has-a, Name) << containment/component-of relationship

Here Person, GenealogicalEntity and Name are Classes, not Objects, from the OO point of view. From the RDF point of view, however, Person is a Subject, and GenealogicalEntity and Name are Objects. But this use of Subject and Object are linguistic uses, so the two kinds of Objects are entirely different things. It is a the schema interpretation that would specify that all three are Classes. RDF per se doesn't care.

Note that is-a and has-a, from the OO point of view, are relationships between Classes, at least the way I am defining them here.

Tom

AdrianB38 2012-05-22T04:18:55-07:00

In Discussion on "Goal 2 -- BG container formats" ( http://bettergedcom.wikispaces.com/message/view/GOALS/30141635#54417872 )
the following posts seem relevant:

Tom Wetmore 21/5/2012 (extract only)
"If vendors unofficially extend BG for their own use they are being very bad and should be shunned. If the bad boy is someone like Ancestry.com, good luck; it will be like the kind of standards usurping that Microsoft has long been famous for. The giants can play fast and loose with standards, and the rest of us can whine all we want, but at the end of the day, must go along.
"I personally believe that BG should NOT be extensible, but that changes and additions can be proposed and acted upon by an official process. Taking this position is tantamount to the claim that the BG designers can do an excellent job of anticipating the needs of the industry. I personally believe this to be the case in theory; however the technical management of the BG process must undergo a radical improvement before this would be possible. You can FHISO for BG in the preceding sentence if you believe that FHISO will end up in charge."

Louis Kessler 21/5/2012 (extract only)
"I 100% agree with Tom that BetterGEDCOM should NOT be extensible, for the exact reasons he states.
"That makes every decision, such as the BetterGEDCOM container decision, a tough one. It will be hard to U turn once all developers have implemented it one particular way."

AdrianB38 2012-05-22T04:28:44-07:00

I believe that it is important to distinguish user defined events and characteristics (a.k.a. attributes, etc, etc.) from the extensions made to GEDCOM by the software suppliers, whose tags are (theoretically) distinguished by the underscore prefix. While both extend the meaning of the GEDCOM standard (and both are _within_ the standard if implemented correctly), user defined events & characteristics have, I believe, less of an impact on the structure of the GEDCOM file in that they are firmly localised in where they occur, whereas software-supplier extensions can be anywhere, at any level. In addition, the control of the two types of extension is theoretically different - software suppliers could, in theory, tell us the meaning of their extensions whereas only individual users can explain their own "new" events, etc.

AdrianB38 2012-05-22T04:30:39-07:00

Note that software suppliers could very well extend the range of events by the same mechanism that users do.

AdrianB38 2012-05-22T04:46:25-07:00

On a personal level, I find it very difficult to believe that BG (or whoever) will ever be able to come up with a list that defines all necessary events and characteristics. New ones could appear for several reasons:
- events in a previously unknown (to BG) culture;
- in a culture known to BG, we might, nevertheless, find events not previously known;
- events might have been considered but rejected as not requiring a separate BG event "code", but the user disagrees;
- events might have been considered, allocated a BG event "code" but this code is the same as another, similar event in possibly another culture - the user dislikes the way that the application software processes the 2 the same, so wants to separate them;

It is, I personally believe, not credible that BG's controlling authority could react fast enough to approve new events. Quite apart from anything, approval means occasional rejection but if the BG controlling authority is not the leading authority in the genealogy of the appropriate culture, how can it make an informed decision that does not upset someone who is the leading authority?

Even if the controlling authority reacts fast enough to approve new events - what then? What value do they provide? How do the software suppliers introduce the new events into their software - they cannot - so how does the user get the new, approved, event in without using a user-definable event code?

ACProctor 2012-05-22T09:00:48-07:00

I wouldn't include Events with Characteristics here Adrian. It will be such a fundamental requirement to be able to define arbitrary Events that I would turn this part on its head.

STEMMA has a general entity for describing Events (protracted as well as simple ones). However, there is a controlled vocabulary of Event-categories and Event-types that can be used to locate the well-defined ones such as marriage, and variations thereof.

The above link also shows how roles are interpreted relative to each category+type combination.

On the subject of extensions, STEMMA only really acknowledges extensions to the set of properties (incl. Person, Place, and Event properties), and extensions to the schema itself.

Extensions to the schema will be controversial. I only added the topic to show how it should be done if it was found necessary. However, extensions to properties is also fundamental.

The points to note in STEMMA are that:

The properties have a data-type
The names of the properties have a scheme to prevent clashes
The properties may have units, e.g. for height & weight
The properties may actually reference another entity such as a Person.

Again, I'm not suggesting this is the way to go but I hope that the approach is sufficiently novel that prevents us from simply copying the GEDCOM approach.

Tony