Home > Goals > Requirements Catalog


Co-Moderators: gthorud and AdrianB38 Snapshots of the page

RULES

Discussions: All discussions about the content of this page shall appear on the discussion tab for this page. The SUBJECT of the topic shall contain only the ID of the requirement, "space dash space" and the subject of the requirement. There should be only one topic per requirement. Enter the Description and Why (see below) in the first posting in the topic.

New entries: All wiki users can add a new requirement. Please check if there is an existing entry for the same requirement, and if there is a similar one check the discussion of that requirement or contact the Proposer of the requirement to see if the existing one can be slightly modified to cover your requirement.If you are in doubt about which group the requirement should be placed, or if your requirement should be entered, contact the moderators, see the top of the page.

Entry Moderator: The person proposing a requirement is responsible for updating the catalog entry following discussion. Updates shall be "announced" in the discussion topic. See rules in the template below.

Requirements Catalog Index

Quick access index:
Administration Characteristics Confidence&Accuracy Conversion Data Date DNA Event Evidence Family Group International Multimedia Person PersonNames Place Ship Source Support Syntax Test Suite Text Handling Timeline


Background

Background to BetterGEDCOM

Background to these pages:
At the Developers' Meeting of 17 January 2011, it was resolved that BetterGEDCOM's existing list of Goals were not appropriate for Goals and the list should be re-structured to extract a simple Goal and reformat the rest as Requirements. This set of pages is being written to carry out that task.

Personal comment by the original author (Adrian Bruce) - the sections of this catalogue were previously used by me as a template for a full-scale IT project. Though trimmed from that, they may still be regarded as over-the-top for a Wiki-based project. Having previously got stuck on the argument whether we had goals or requirements, I would rather take too rigorous a path now. Inspiration comes from the Volere Requirements Specification Template in "Mastering the Requirements Process" by Robertson & Robertson. Note also that I use the term "project" for the BetterGEDCOM work, even though it (so far) satisfies no formal definition of what a project should be.

Goals of BetterGEDCOM

BetterGEDCOM will be a file format for the exchange and long-term storage of genealogical data.
It will be more comprehensive than existing formats and so become the format of choice.
(Note - first sentence is a minor rewording of Goal 1 agreed 3 Jan 2011. Second sentence justifies why BG and not an existing format.)

Clients, Customers, Stakeholders & Users

(This section is here simply to make you think)

Requirements Constraints

(Not all of these may be relevant in practice)

Naming Conventions & Definitions

See Glossary of Terms

Assumptions



Scope of BetterGEDCOM product

BetterGEDCOM will produce definitions of the file format in:

BetterGEDCOM should provide a test suite of data that will

BetterGEDCOM will not have responsibility for testing application software.

BetterGEDCOM will not have responsibility for defining how individual applications should translate genealogical data from their native formats to and from the BetterGEDCOM format, nor from application's own varieties of GEDCOM to and from the BetterGEDCOM format. (Experienced users may make suggestions, but the responsibility lies with the application's owners.)

Requirements Introduction

A division between functional and non-functional requirements is traditional in Requirements Catalogues. Functional requirements say what the new system should do (e.g. "Pay staff according to the 1929 Conciliation Staff Agreement") - non-functional requirements say how the system should do it (e.g. "Pay 10,000 staff overnight each Tuesday", or "Run on Windows 2000 Server OS"). As a result, the "techno-speak" requirements are part of the non-functional requirements.

Given that the BetterGEDCOM file format does not do anything itself, it is debatable how relevant the division is, so, after trying to keep to it, I am putting them all together.

The Requirements below use the following template.

Id:
Code to identify the requirement - in bold
Title:
A short description - max 10 words - in bold.
Description:
One or two sentences - use "must" if importance is mandatory; "should" if very desirable; "could" if desirable.
Importance:
One of three values: Mandatory; Very Desirable; Desirable. For the time being, this is the assessment of the proposer.
Why?:

Source:
If from another page or discussion, please note and link All previous discussions should go here, but the last/current discussion should be linked to in Discussion.
Way forward?:
Comments on possible ways forward
Dependencies:

Approval status:

Proposer:
The creation date for the requirement and wiki ID of the proposer. and optionally name.
Changes:
Date changed (month and day) eg. Feb 21 and user id, comma separated list. Append last change to end of the line, eg: 22 Feb gthorud, 23 Feb userxxx
Discussion:
Link to the current Discussion topic for this requirement. The subject of the topic should be the ID followed by the Title. See top of page.
Copy this Empty template to create a new requirement:

Id:

Title:

Description:

Importance:

Why?:

Source:

Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:



Again, acknowledgments are made to the Volere Requirements Shell template in "Mastering the Requirements Process" by Robertson & Robertson

Detailed Requirements


Research Administration


Id:
Admin01
Title:
Research Administration Information
Description:
BetterGEDCOM must allow recording of administrative information needed to organise and document the research work.
Importance:

Why?:
--- This is a place holder at the moment, details and detailed requirements to be added.
----- See the discussion of this requirement for summaries of the functionality in some genealogy programs.
Source:

Way forward?:
More detailed solution, see Admin02 onwards.
Dependencies:

Approval status:

Proposer:
7 March 2011 gthorud
Changes:

Discussion:
Discussion

Id:
Admin02 (was Task01)
Title:
Research Task
Description:
BetterGEDCOM shall be able to record and track a Task (search or other task) that needs to be done or has been done. Information recorded about the task itself could be a Title/Short description, a full description (formatable). Research tasks can be organized in simple lists or grouped into Objectives, see below.
Importance:
Very Desirable
Why?:
Supports faithful recording of research status and results, and reduces repetition of labors.
Source:
Gramps, GenTech model
Way forward?:

Dependencies:

Approval status:

Proposer:
BrianJD
Changes:

Discussion:
Discussion at Task01 - Research Task

Id:
Admin03
Title:
Task information
Description:
BetterGEDCOM shall be able to record information about a Task, for example used for Categorisation (keyword, category, type (research/correspondence/other)), Progress management (priority, staus, dates. comments about dates), Resource use (Expences, number of hours used)
Importance:

Why?:

Source:

Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Admin04
Title:
Identification of persons, events, places that the task is about
Description:
BetterGEDCOM shall be able to link a task to records representing the person(s), event(s), place(s), source(s) etc. that the task is about, existing when the task is defined (started). A possibility is also to record links to persons, events etc. that are created as a result of the task.
Importance:

Why?:

Source:

Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Admin05
Title:
What to search
Description:
BetterGEDCOM shall be able to record information about, or link to records representing, WHAT to search – e.g. a source. Possibly an URL pointing to the source.
Importance:

Why?:

Source:

Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Admin06
Title:
Where to do the task
Description:
BetterGEDCOM shall be able to record information about, or link to records representing, WHERE to do the task – Location name (if not linked to), Repository, Place (eg. cemetery), Address
Importance:

Why?:

Source:

Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Admin07
Title:
Task results
Description:
BetterGEDCOM shall be able to record information about, or link to records representing, the findings and results produced by the task (an overall description of the results, Excerpts, Multimedia, Citations, Filing Cabinet Reference)
Importance:

Why?:

Source:

Way forward?:
The information recorded for this requirement overlaps with the information in the Evidence and Conclusion model.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Admin08
Title:
Objectives - for grouping of tasks
Description:
BetterGEDCOM should be able to group several tasks into Objectives (Target) , each Objective representing a question to be answered or a problem to be solved. An objective is usually defined before the tasks needed to achieve the objective. Objectives should have a description and will be the record pointing to users, events, places etc rather than each task. Some elements of the information recorded for tasks (see above) can be defined for the objective rather than each task,
Importance:

Why?:
Questions and problems are in most cases the reasons that one or more tasks are performed.
Source:

Way forward?:
An objective record may contain elements of the info mentioned in Admin03
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Admin09
Title:
Projects - for grouping of objectives
Description:
BetterGEDCOM could be able to group several objectives into projects. Projects could be split into sub-projects. Each (sub-)project should have a name, elements of task progress listed above, completion grade (%) and description.
Importance:

Why?:

Source:

Way forward?:
A project record may contain elements of the info mentioned in Admin03
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Admin10
Title:
Correspondence log
Description:
BetterGEDCOM could be able to record information about letters, emails, phone calls or other correspondence related to the research. Item in the log can have a type (call, email etc), direction (in/out), researcher, correspondent, subject, date, reference to filing system and details about the correspondence. Contact information (address, phone etc) could also be recorded..
Importance:

Why?:

Source:

Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Admin11
Title:
Researchers
Description:
BetterGEDCOM could be able to record information about the researchers using the program or other cooperating/corresponding researchers. Researchers can have a name, languages, registration number (?), notes, media (photo) and contact info. A researcher can be linked to a person in the database. The Gentech model also links researchers to assertions, i.e. who made the assertion.
Importance:

Why?:

Source:

Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Admin12
Title:
Support Privacy Settings
Description:
Genealogy programs must support the user by providing controls/options/settings/reports to assist the user in maintaining the privacy of the people in the database, particularly those living.
Importance:
Mandatory
Why?:
Users will need to differentiate between data that can be shared or not. Users may need to differentiate between different data that be shared with different groups of people. For example, only the data related to one branch of a family would be shared with the people in that branch.
Source:
NGS Standards For Sharing Information With Others
Way forward?:

Dependencies:
The following requirements may impact or be impacted by this requirement:
  • Conversion02 - Support for generating web pages
  • Data06 - Transfer between one user's programs and to other users/services
  • Data-Ind01 - Data about persons
  • Data-Ind02 - Biological relations independent of family
  • Data-Ind04 - Sex-change individuals
  • Data-Place06 - Location to include address
  • DNA01 - Results from DNA tests
  • Multimedia02 - Information about multimedia objects
  • Multimedia03 - References to Multimedia
  • TextHandling03 - Footnotes/endnotes in notes
This list may be incomplete as requirements evolve.
Approval status:

Proposer:
12 Jul 2011 Christine_E
Changes:

Discussion:
Discussion at Admin12 - Support Privacy Settings

Confidence and Accuracy

Id:
ConfAcc01 (Confidence and Accuracy) (was Data03)
Title
Support for approximately known values
Description:
BetterGEDCOM must allow the recording of approximately known values in all appropriate contexts.
Importance:
Mandatory
Why?:
GEDCOM already allows dates to be "about yyyy".
Note - this is not the same as assigning a probability to a value - e.g. "Probably 1812" is not the same as "About 1812", and this requirement is not intended to cover concepts like "Probably 1812".
Source:
Tom Wetmore's Goal and Requirements plus various discussion pages.
Way forward?:
See Data-Date01 for this requirement on dates.
See Data-Place01 for this requirement on locations.
Work on the data model needs to establish if there are any other values that either need or would benefit from, the ability to record approximation.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:

Note that in this Catalogue, we use the term "Characteristic" to refer to what have been referred to as properties, facts, attributes, characteristics or traits. See PFACT in Glossary. Again, my use of this term does not imply that it should be the term used in the Data Model.

Id:
ConfAcc02 (was Data04)
Title:
Levels of Confidence in Database Conclusions
Description:
BetterGEDCOM should allow the recording of recognized levels of confidence associated with database conclusions
Importance:
Very Desirable
Why?:
Supports faithful recording of research status and results. This uncertainty / level of confidence can apply to various sub-items, including, but not necessarily restricted to, dates ("Probably 1812"), places ("Likely London, England") and relationships ("Possible father is ...").
Source:
_Evidence Explained_, 2007, p. 19, "certainly," "probably," "possibly," "likely," and "apparently," "perhaps"
Way forward?:

Dependencies:

Approval status:

Proposer:
GeneJ
Changes:
2011 Feb 21 - created
2011 Mar 21 - add examples to "Why".
Discussion:
Discussion at Data04 - Levels of Confidence in Database Conclusions

Id:
ConfAcc03 (was Data05)
Title:
Universal Qualifier Symbol ("?")
Description:
BetterGEDCOM should incorporate methods allowing users to apply the universal qualifier "?" before dates (or parts of dates), locations, names, etc.
Importance:
Very Desirable
Why?:
Supports faithful recording of research status and results.
Source:
Hoff and Leclerc, _Genealogical Writing in the 21st Century_ (2006), p. 115, "Commonly used Symbols," for "?" as, "uncertain interpretation of original text."
Way forward?:

Dependencies:

Approval status:

Proposer:
GeneJ
Changes:
2011 Feb 21 - created
Discussion:
Data05 - Universal Qualifier Symbol ("?")

Id:
ConfAcc04
Title:
Document Rejected Conclusions
Description:
BetterGEDCOM could allow the recording of rejected conclusions.
Importance:
Desirable.
Why?:
If a conclusion is rejected, it can be useful to record the rejected conclusion.
  • This should help to stop the researcher revisiting their own mistakes in future, when they have forgotten previous research;
  • Negative evidence can be useful in itself (e.g. "Thomas' mother was not Mary, so must have been Margaret or Molly");
  • Erroneous conclusions listed on the Internet are the bane of many genealogists' lives. It may be useful to have a refutation to hand.
Source:
Extension of Data04 "Levels of Confidence in Database Conclusions"
Way forward?:

Dependencies:
Data04 "Levels of Confidence in Database Conclusions"
Approval status:

Proposer:
AdrianB38, 2011 Mar 21
Changes:

Discussion:

Conversion

Id:
Conversion01
Description:
The coverage of the types of genealogical data must allow faithful import of data from all current, common genealogical software with no material manual intervention, subject to the limits of the applications involved.
Importance:
Mandatory
Why?:
If users cannot move their data to BetterGEDCOM formats, they will not use BG
Source:

Way forward?:
The data model for BetterGEDCOM must be rich enough to allow software companies to write routines to copy data from their internal file formats and / or their versions of GEDCOM to the BG format.
Therefore, the BetterGEDCOM data model must include everything in the current GEDCOM data model - but not necessarily in the same format - e.g. in-line sources could be converted to source records.
Dependencies:
We are dependent on the software companies writing that conversion code.
Approval status:

Proposer:

Changes:

Discussion:


Conversion02 - removed by the submitter. See discussion at Conversion02 - Support for generating web pages

Data

Note - The prefix "Data" is used for generic requirements that do not appear to be obviously applicable to only one group.

Id:
Data01
Title
Compatibility with GEDCOM
Description:
The data model that underlies BetterGEDCOM must be a superset of the models used by existing major, genealogical applications to the fullest extent deemed possible during design
Importance:
Mandatory
Why?:
BG compatible software must be able to import data from existing applications and must be at least as good as existing applications in relation to its model.
Source:
Tom Wetmore's Goal and Requirements
Way forward?:
Produce a data model to do this.
Dependencies:

Approval status:

Proposer:

Changes:
2011 Nov 15; Adrian B; rename from "Backwards Compatability" to "Compatibility with GEDCOM" to avoid use of "Backwards" - I thought that was the correct direction but others don't, so I avoid the use of the term - and correct the spelling as well.
Discussion:


Id:
Data02
Title
Support for all conventional genealogical processes
Description:
The data model that underlies BetterGEDCOM must provide a set of data entities that will allow genealogical applications to support all conventional genealogical processes.
Importance:
Mandatory
Why?:
BG compatible software must be able to carry out normal processes
Source:
Tom Wetmore's Goal and Requirements
Way forward?:
Produce a data model to do this.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:

Data03, 04 and 05 has been moved to Confidence and Accuracy.

Id:
Data06 (was Usage01)
Title:
Transfer between one user's programs and to other users/services
Description:
BetterGEDCOM should support data that needs to be exchanged between 1) one user’s applications possibly from different vendors or 2) several user’s/service provider’s applications.
Importance:

Why?
The requirement in these cases may be different, but betterGEDCOM must support both. For example a program may support management/classification/grouping of collection of media, e.g. photos. The grouping may not be of interest to other users, but should be transferred when the user transfer media between her/his own programs. Another example genealogy project management information, eg. planed lookups in a source, that may not be of interest to other users – but should be possible to transfer to the user’s other programs. Thus, all info stored by a program is a candidate for exchange.
Management data intended to be transferred between one user’s applications are not likely to be transferred to network services, and are thus not restricted by specifications that can be transferred to such services.
Source:

Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposer:
gthorud
Changes:
22 Feb gthorud
Discussion:


Id:
Data07
Title:
Independent record collections
Description:
BetterGEDCOM shall be able to record eg. only records containing information about places, without any person records or other types of records.
Importance:

Why?:
Independent record collections allows exchange of collections of source meta info, source data, place info, media meta data, media, timelines etc. This could facilitate projects where user could collaborate to create such collections, without having to rely on network services or other parties to provide the necessary facilities.
Source:
(Originally proposed by Tom in some discussion I can’t find)
Way forward?:

Dependencies:

Approval status:

Proposer:
22 Feb gthorud
Changes:

Discussion:
Data07- Independent record collections
Data08 - unasigned, see the discussion at Data08 - Importing Data (Proposal)

Id:
Data09
Title:
Collections of source data
Description:
BetterGEDCOM could allow recording of data from sources as a collection of records where none or only some are linked to persons or other records in the BG-file. Examples are transcriptions of a complete source or a section in the source, e.g. births in a church book, images of same or an index to the source.
Importance:
To be determined
Why?:
Often such collections are published in databases on the internet, but there could be many reasons why that is not practical, e.g. there might not be a database suited for the type of data or there could be copyright issues. It should be possible to search for data in a collection. It could be possible to link records in a collection to persons etc., incl source meta data, in the BG-file. It would also allow the user to see which records in the collection that are not linked to a person, and thus also to see that a candidate record in a collection is already assigned - thus avoiding e.g. to assign the same birth record to two different persons.
Source:

Way forward?:
The solution must be general so that it can handle many types of sources. For structured data, some general data elements, that are common to many sources, could be defined - facilitating searches across collections - e.g.given names, surnames, date of birth, "place of residence", place of birth (or place of event). Data could also be non structured text or images. An alternative could be to encode such collections in terms of persons, places, and events, in separate sets of data (some current programs can convert tabular transcriptions into Gedcom format), or keep the data in table structures with user assigned column headers imported from e.g. spreadsheets, possibly in a two level structure - one for the record (event) and one for the persons. A solution could also be used to store individual source records downloaded from web-services (would require a standard download format) or simply records entered by the user. There are lots of alternatives.
Dependencies:
This is somewhat related to Data07
Approval status:

Proposer:
gthorud 9 May 2011 - as instructed by todays Developer meeting
Changes:

Discussion:
Discussion at Data09 - Collections of source data

Characteristic


Id:
Data-Char01 (Characteristic)
Title

Description:
BetterGEDCOM must support the recording of the characteristics of persons, families, groups, places, "ships" etc.
Importance:
Mandatory
Why?
Current GEDCOM allows the recording of attributes for individuals and families.
Source:
Various discussion pages.
Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Data-Char02 (was Data09 for Characteristics)
Title
Record locations for characteristics
Description:
BetterGEDCOM must support the recording of location values applicable to all characteristics of persons, families, groups, places, "ships" etc. (unless specifically agreed otherwise).
Importance:
Mandatory
Why?
Current GEDCOM allows the recording of place for attributes. Note this does not imply that the recording of a location against any particular characteristic makes sense - e.g. recording of a location against someone's sex would seem pointless. On the other hand, recording of a location against someone's name might well be useful - if someone emigrated under an assumed name, it might be useful to record USA (e.g.) against their new name, and England against their old.
Source:
Various discussion pages.
Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposer:
AdrianB38
Changes:
2011 Mar 03 - split off requirement that location goes down to address to make it more obvious - raise new Requirement Data-Place06 for it.
Discussion:


Id:
Data-Char03
Title
Multiple Events & Characteristics etc
Description:
BetterGEDCOM must allow multiple characteristics and events of the same type against each person, family, group, place, "ship" etc.. In particular, it must be possible to allow multiple birth and death dates against individuals.
Importance:
Mandatory
Why?
Most applications allow multiple characteristics, occupations for instance, against an individual. Some applications allow multiple birth and death dates against an individual. The normal meaning of this is that these are alternatives. It must be possible to convert such data to BetterGEDCOM format.
As GEDCOM v5.5 allows multiple events and multiple attributes, including multiple birth-dates, this requirement is also mandated by the need to allow GEDCOM compatible data to be represented in BetterGEDCOM form..
Source:
Various discussion pages.
Way forward?:
Identify any events and attributes in GEDCOM that are currently only allowed to have one occurrence and decide what to do about these - with the exception of SEX, a first glance at GEDCOM 5.5 suggests the single occurrence items for the Individual are internal to the GEDCOM structure, rather than relating to their family history and genealogy and thus it may be appropriate for them to remain as single occurrence items.
Depending on the conclusions above, create this in the Data Model.
Dependencies:

Approval status:

Proposer:

Changes:
2011 March 05 AdrianB38 - split off sex-change to Data-Ind04
2011 March 05 AdrianB38 - add clarification that this is also a compatibility requirement.
Discussion:


Id:
Data-Char04
Title
Date all Characteristics
Description:
BetterGEDCOM must allow the recording of dates against all characteristics of each person, family, group, place, "ship". In particular, it must be possible to allow dates against an individual's name characteristics.
Importance:
Mandatory
Why?
While GEDCOM currently allows multiple names against individuals, there is no ability to record a date against each name, implying that the names are used at the same time. This may or may not be true. Allowing dating of names allows more precise description of married names, for instances.
Source:
Various discussion pages.
Way forward?:
Create this in the Data Model.
Dependencies:
Data-Char03
Approval status:

Proposer:

Changes:
2011 March 05 AdrianB38 - add title
Discussion:


Date


Id:
Data-Date01 (was date part of Data03)
Title
Approximately known dates
Description:
BetterGEDCOM must allow the recording of approximately known dates. This requirement refers to approximation of single dates only. The date in question may be represented as a single month or a single year.
Importance:
Mandatory
Why?:
GEDCOM already allows dates to be "about yyyy".
Note - this is not the same as assigning a probability to a value - e.g. "Probably 1812" is not the same as "About 1812", and this requirement is not intended to cover concepts like "Probably 1812".
See also Data03
Source:
Tom Wetmore's Goal and Requirements plus various discussion pages. DeadEnds Date Formats Dicussion of dates in the DeadEnds data model.
Way forward?:
Note that, for the purposes of clarity, date periods are covered by requirement "Data-Date04 Date Periods" and date ranges are covered by requirement "Data-Date05 Date Ranges".
Include this in the Data Model.
The existing GEDCOM options to be covered by this requirement are logically equivalent to the following phrases:
ABOUT date
ESTIMATED date
CALCULATED date
See pp39 & 40 in in GEDCOM Standard 5.5 for the meanings. No other options or meanings have yet been identified.
Dependencies:

Approv. status:

Proposer:

Changes:
14 May 2011 - added rows to requirement table; added discussion and link (GeneJ)
15 May 2011 - explicitly exclude the GEDCOM usages of date period and date range from this. It is acknowledged that some interpretations of "approximate" could cover date ranges, so we make it clear what the interpretations are.
Discussion:
Discussion at Data-Date01 (was date part of Data03) / Approximately known dates

Id:
Data-Date02
Title
Calendars
Description:
A BetterGEDCOM file must define the calendar to be used for each date stored in the file. This definition should be accompanied by a definition of the ordering of the date items within the date (e.g. year/month/day or day/month/year or month/day/year or ...)
Importance:
Mandatory
Why?:
Dates may occur in source documents in all sorts of calendar representations. It is desirable that the codified representation of that should differ as little as possible from the written characters in the source, to reduce the scope for error in input or output. Therefore, BetterGEDCOM needs to accommodate Jewish, Muslim, Chinese, etc, calendars, Julian or Gregorian calendars by country (e.g. with France and England on Gregorian and Julian calendars respectively(?) the two countries did not use the same day/month for "today"); French Revolutionary calendars, etc. Sometimes a date is just a text string.
Source:
Dicussion of dates in the DeadEnds data model
Way forward?:
Create this in the Data Model.
To be decided: Whether Data Model includes a facility for defining a default calendar and date-item ordering, or whether every date must be marked up with these items. If the latter option is chosen, this relies on intelligent application design to reduce user workload.
Note also - there is an assumption here that dates will be stored in various calendars and not as (e.g.) number of days since an agreed event.
Dependencies:

Approval status:

Proposer:

Changes:
2011 Feb 22 15:22 CET - alter description to "must" to match "mandatory" importance.
2011 Feb 22 15:43 CET - add to "Way Forward" comments about possible default calendar and assumption that dates will be stored in calendar form
2011 Feb 22 21:05 CET - alter description to separate definition of calendar itself from the ordering of the date items as these are 2 concepts. Also attempt to clarify Way Forward re defaults.
Discussion:
Data-Date02 modified

Id:
Data-Date03
Title:
Date phrases
Description:
BetterGEDCOM must allow a "date" to be entered as a phrase where the values are not recognizable to a date parser, but which gives a human reader information about when an event occurred. It must allow such a phrase to have an optional date in parseable format that can be used to interpret the phrase.
Importance:
Mandatory
Why?:
1. GEDCOM Standard 5.5 includes these two as DATE_PHRASE and INT <DATE> (<DATE_PHRASE>)
2. A phrase may give time-relative information even if a date is not known or not known well - e.g. "at the Battle of Brunanburh" is more informative than "between 934 and 939"; or "on a Tuesday in the spring of 1873" can be interpreted as 1873 but the words are informative.
Source:
GEDCOM Standard 5.5 Dicussion of dates in the DeadEnds data model.
Way forward?:
Include this in the Data Model
Dependencies:

Approval status:

Proposer:
AdrianB38
Changes:

Discussion:
Data-Date03 Date Phrases

Id:
Data-Date04
Title
Date periods
Description:
BetterGEDCOM must allow the recording of periods of time, denoted by start and / or end dates. BetterGEDCOM must explicitly define whether or not the end or start date is included in the period of time.
Importance:
Mandatory
Why?:
GEDCOM already allows this. Failure to include will result in failure to convert the vast majority of GEDCOM based files.
Source:
GEDCOM Standard 5.5, page 41
Way forward?:
Include this in the data model.
GEDCOM options are logically equivalent to the following phrases:
FROM date
TO date
FROM date-1 TO date-2
where date, date-1 and date-2 are known, unqualified dates - i.e. "FROM ABOUT 1066" is not included as the ABOUT is not permitted in this requirement.
It is suggested that the end or start date are included in the period of time as this is normal usage in the English language - e.g. "The First World War lasted FROM 1914 TO 1918" - 1914 and 1918 are included in the War's period.
The start or end date may be expressed as a Date Phrase, e.g. "FROM The marriage of Fred and Gladys Pugh"
Dependencies:

Approval status:

Proposer:
AdrianB38
Changes:
2011 May 15 - include this as a separate requirement, to make it explicit what Data-Date01 is concerned about.
Discussion:
Data-Date04 - Date periods

Id:
Data-Date05
Title
Date ranges
Description:
BetterGEDCOM must allow the recording of ranges of time, denoted by start and / or end dates, within which an event takes place.That event may take place on a single day, or it may take place over a period of days.
BetterGEDCOM must explicitly define whether or not the end or start date is included in the range of time.
Importance:
Mandatory
Why?:
GEDCOM already allows this. Failure to include will result in failure to convert the vast majority of GEDCOM based files.
Source:
GEDCOM Standard 5.5, page 41
Way forward?:
Include this in the data model.
GEDCOM options are logically equivalent to the following phrases:
BEFORE date
AFTER date
BETWEEN date-1 AND date-2
where date, date-1 and date-2 are known, unqualified dates - i.e. "AFTER ABOUT 1066" is not included as the ABOUT is not permitted in this requirement.
It is suggested that the end or start date are included in the range of time as this is the clear implication of page 42 in GEDCOM Standard 5.5, which explicitly states that:
1852 is equivalent and interchangeable with BETWEEN 1 JANUARY 1852 AND 31 DECEMBER 1852
The start or end date may be expressed as a Date Phrase, e.g. "AFTER The Fall of the Roman Empire"
Dependencies:

Approval status:

Proposer:
AdrianB38
Changes:
2011 May 15 - include this as a separate requirement, to make it explicit what Data-Date01 is concerned about.
Discussion:
Data-Date05 - Date Ranges

Id:
Data-Date06
Title
Approximate Date periods and ranges
Description:
BetterGEDCOM could allow ranges and periods of time, to be denoted denoted by approximate start and / or end dates.
Importance:
Desirable
Why?:
Many events cannot be located precisely in time - even the start and end dates can be unclear. GEDCOM has no means of expressing this.
For instance, "The Dark Ages lasted FROM ABOUT 410 TO ABOUT 1066 in England" (no arguments about the truth of that please!)
Source:

Way forward?:
Include this in the data model.
Dependencies:
Data-Date01 Approximately known dates
Data-Date04 Date periods
Data-Date05 Date ranges
Approval status:

Proposer:
AdrianB38
Changes:
2011 May 15 - new requirement
Discussion:
Data-Date06 - Approximate Date periods and ranges


Event


Id:
Data-Event01 (was Data10)
Title:
Events with multiple people, with roles
Description:
BetterGEDCOM must support the recording of events that affect multiple people. In particular, it must be possible to record the role of each person in the event.
Importance:
Mandatory
Why?
Events do affect multiple people. Current GEDCOM has almost no ability to record multi-person events, excepting perhaps births and adoptions. However, the parents of a birth in GEDCOM are usually implied by the parents of the appropriate family, creating potential issues when that family is an adoptive one. It would be better to have a birth event involving three people (e.g. child and two biological parents typically), with this data separate from the family.
Source:
Various discussion pages. A typical item in many other post-GEDCOM 5.5 proposals.
Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposer:

Changes:
AdrianB38 - 2011 Apr 17 - Add link to new discussion to record things we're liable to forget.
Discussion
Discussion at Data-Event01 - Events with multiple people, with roles

Id:
Data-Event02 (was Data09)
Title:
Multiple places per event
Description:
BetterGEDCOM should support the recording of multiple places for a single event.
Importance:
Very desirable
Why?
Current GEDCOM allows the recording of one place for events. There are application extensions to record more than one - e.g. FamilyHistorian records two places for emigration - a "from" and a "to" place. Users may also define "Journey" events, where a "from" and a "to" location would seem natural.
Source:
Various discussion pages. Qualifying Locations for Events
Way forward?:
  • Analyse whether there is a need for more than two places per event - e.g. "from", "to", "via";
  • Analyse whether location-roles are mandatory, optional or forbidden. (Location-roles refers to the role that a location plays in an event. Examples of roles are "from" and "to". Locations without roles would be just listed, e.g. "The 1906 earthquake happened at X and Y")
  • If roles are needed - what are the roles?
  • Create this in the Data Model.
Dependencies:

Approval status:

Proposer:

Changes:
AdrianB38 - 2011 Mar 22 - make explicit this is multiple places for one event
AdrianB38 - 2011 Mar 24 - Clarify options for roles or not in 2nd bullet of "Way Forward"; Remove "Way Forward" bullet "Should multiple place events be listed?" as this is ambiguous and covered by 2nd bullet
Discussion:
Discussion on Multiple Places per event

Id:
Data-Event03
Title:
Central registry of event types (and possibly other types)
Description:
BetterGEDCOM should create a central registry of event types that are not defined in the main standard. The registry shall be updated more frequently than the main standard. It could potentially contain types used in structures containing non-standard type and value pairs. A procedure (rules) must be defined for maintenance of the registry. The information registered for event types (and other types) must be specified (eg. type name, definition, roles, event value types).
Importance:

Why?:

Source:
Custom GEDCOM Tags
Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Data-Event04
Title:
Events over a time-period
Description:
BetterGEDCOM must define what an Event is and must allow an Event to take place over a time-period of more than one day.
Importance:
Mandatory
Why?:
Current GEDCOM allows an Event to last for more than 1 day. Hence there will be many GEDCOM files containing such Events.
Page 35 of the (draft) GEDCOM v5.5.1 standard says:
"As a general rule, events are things that happen on a specific date. Use the date form ‘BET date AND date’ to indicate that an event took place at some time between two dates. Resist the temptation to use a ‘FROM date TO date’ form in an event structure. If the subject of your recording occurred over a period of time, then it is probably not an event, but rather an attribute or fact."
This can give the impression that events are only things that happen on a specific date.However, even this wording specifically allows events occurring over a period of time.
For clarity, the BetterGEDCOM standard must make it clear that events can occur over a period of time.
Source:
See discussion Syntax09 Define Event vs. Attribute. This discussion was primarily about distinguishing the difference between Events and Attributes if necessary. In there were various postings about the definition of an Event and whether it could last over several days or not. Those discussions stand independent of the differences between events and attributes.
Way forward?:
Define the event entity thus in the Data Model. Note the proposed definition in Syntax09 Define Event vs. Attribute that an event is something leading to a change - this might be a useful definition.
Dependencies:
Data-Event01 "Events with multiple people, with roles" and Data-Event02 "Multiple places per event" will influence the way forward on this.
Approval status:

Proposer:
AdrianB38 2011 Mar 25
Changes:

Discussion:


Id:
Data-Event05
Title:
Event Classes
Description:
BetterGEDCOM should define Event Classes grouping similar events into one class describing common rules for how the data recorded about events should be handled by programs. One example is a "Marriage event class" (this name may be changed) that would contain events such as Marriage, Civil Marriage, Cohabitation start, Partnership and other events that describes a union between two persons - all these events should be treated by programs as they currently handle marriage, although with different terms.
Importance:

Why?:
The purpose is to allow new events to be defined in the standard, a registry (see Data-Event03) or by users that will be handled by applications according to rules defined by the class that the event belongs to. The rules may be simple, just saying that the events shall be handled in the same way, when a new type of event is defined to be in the same class as a well established event type. Classes will be used when the event requires a more specialised handling than can be handled by a sentence template, e.g. when marriage events are placed in special paragraphs in reports - or depending on how data about families will be recorded in BetterGEDCOM, the event could be the basis for establishment of a data structure in the program representing a family.
Source:
This has been discussed in Data-Fam02 and in ?? (earlier discussions?) "I Want My Genealogy Software And BetterGEDCOM To Do This" on Shortcomings of GEDCOM
Way forward?:
The possible types of classes should be identified and populated with an initial set of events. The initial reason to do this is to verify that there is a need for classes. Rules should be defined for each class.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Data-Event06
Title:
Events as separate records
Description:
BetterGEDCOM must allow other records to reference events. Thus events should be recorded as separate records.
Importance:

Why?:
There is a need for other records to reference an event, for example from structures recording administrative information. Also, since we will have multiple persons participating in an event, the event should not be stored in the record of just one of those persons.
Source:

Way forward?:

Dependencies:

Approval status:

Proposer:
17 March 2011 gthorud
Changes:

Discussion:


Id:
Data-Event07
Title:
Person names per event
Description:
BetterGEDCOM must allow the recording of the name used by (recorded for) a person playing a role in an event.
Importance:
High
Why
A person may have used different names during his/her life, or one name may be recorded with different spellings in documents. For example, in many countries, people used the name of the farm where they were living, as a surname, so they would change their name when they moved to a new farm. It is best to record that name in the context of an event so it can be presented in the right context, rather than having a simple list of names as in current Gedcom.
Source:

Way forward?:

Dependencies:
Data-Event01
Approval status:

Proposer:
8 June 2011 gthorud
Changes:

Discussion:


Again, acknowledgements are made to the Volere Requirements Shell template in "Mastering the Requirements Process" by Robertson & Robertson

Family


Id:
Data-Fam01 (was Data06)
Title
Families independent of biological relations
Description:
BetterGEDCOM must support the recording of genealogy / family history data about the family as a (possibly informal) social grouping, independent of any biological relationship or legal adoptions.
Importance:
Mandatory
Why?
Family units exist where there is no underlying biological relationship and no legal adoptions.
Biological relationships exist where there is no family in any meaningful sense.
Existing GEDCOM files may contain data (possibly user-defined tags) recorded about the social grouping of the family, which must be carried forward on conversion to BetterGEDCOM format.
Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:
GEDCOM does this. Various discussion pages.
Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:
Data-Fam01 Family as a Social Grouping

Id:
Data-Fam02
Title:
Cohabitants
Description:
BetterGEDCOM must support the recording of information about cohabitants, with or without, common children. Cohabitants should be treated in the same way as married couples, and there should be events for the establishment and dissolution of "cohabintants". Some couples may start out as cohabitants and then marry.
Importance:

Why?:
The percentage of couples that are cohabitants is increasing in the western world, in some countries it is as high as 25-30%. BetterGEDCOM should not discriminate people in such relations.
Source:

Way forward?:
Depends on how BG implements relations/families in general. It may be sufficient with event types similar to marriage and divorce.
Dependencies:

Approval status:

Proposer:
26 March 2011 gthorud
Changes:

Discussion:
Discussion at Data-Fam02 - Cohabitants

Group


Id:
Data-Group01 (was Data05)
Title:
Data about groups of persons (eg. organisations)
Description:
BetterGEDCOM must support the recording of historic data about groups of persons, such as organisations, companies, regiments.
Importance:
Mandatory
Why?:
Organisations, companies, regiments, etc, have a major impact on individuals, yet no mechanism currently exists in GEDCOM to record any of their details in a structured manner, nor to link organisation data to people.
Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:
Shortcomings of GEDCOM
Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Person (was Individual)


Id:
Data-Ind01 (was Data04)
Title:
Data about persons
Description:
BetterGEDCOM must support the recording of genealogy / family history data about persons.
Importance:
Mandatory
Why?:
Statement of the obvious. Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:
GEDCOM does this.
Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Data-Ind02 (was Data07)
Title:
Biological relations independent of family
Description:
BetterGEDCOM must support the recording of biological relationships independent of any family grouping. Biological relationships must include surrogacy, etc.
Importance:
Mandatory
Why?
Biological relationships can exist where there is no family in any meaningful sense.
Existing GEDCOM files create a family for biological relationships. This is not always appropriate.
Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:
Various discussion pages.
Way forward?:
Create this in the Data Model.
Dependencies:

Approv. status:

Proposer:

Changes:

Discussion:
Data-Ind02 Biological rel'ns indep of family

Id:
Data-Ind03
Title:
Non-biological, non-family relationships
Description:
BetterGEDCOM must provide a means to document relationships between individuals that are not based on biology or family, e.g. "X is the friend of Y".
Importance:
Mandatory
Why?:
GEDCOM has the ASSO tag in the ASSOCIATION_STRUCTURE (see GEDCOM 5.5) that may be used to document such relationship as god-parent, friendship, etc.
Also the ALIA tag can be used to link individuals' records, when the individual are suspected to be the same person.
Source:
GEDCOM Standard version 5.5 ASSOCIATION_STRUCTURE and ALIA tag
Tom Wetmore 2011 Feb 27 Syntax09 Define Event vs. Attribute discussion
Way forward?:
Comments on possible ways forward
Dependencies:

Approval status:

Proposer:
AdrianB38 2011 Feb 27
Changes:

Discussion:
Discussion

Id:
Data-Ind04
Title
Sex-change individuals
Description:
BetterGEDCOM should support the recording of sex-changes for individuals.
Importance:
Very desirable
Why?
There are individuals who have gone through a sex-change. BetterGEDCOM should be able to describe their history accurately, as it does anyone else.
Source:
Various discussion pages.
Way forward?:
Need to agree on what values are required - is male / female enough? Is there a need to consider not just sex (the biological and physiological characteristics) but also gender (the social construct)?
Dependencies:

Approval status:

Proposer:
AdrianB38 2011 March 05
Changes:

Discussion:


Person Names


Id:
Data-PersonNames01
Title:
Sorting on multiple given names and surnames
Description:
BetterGEDCOM shall provide a way to identify parts of names (whole words or parts of words) that shall be used for sorting, identifying if the part should sort as a given name or surname. It shall allow several such surname parts and could allow several given name parts. A priority could be assigned the name parts sorting as surnames. All this information related to sorting is a suggestion to the recipient for how name parts should be sorted.
Importance:
Very desirable
Why?:
Many cultures operate with several surnames. It should be possible to sort on those names in indexes etc. The same applies to given names (forenames) because a person may be known by any one of those given names. Some words in a name (eg. prefixes) are not used for sorting, and often the beginning of a name is not used for sorting (d’ in d’Hondt) (Honda should sort before d'Hondt), or one “word” may sort as two names eg. both Berg and Olsen in Berg-Olsen. When there are several surnames, some countries consider the last surname to be most "significant" while others considers the first to be the most significant. Identification of these parts have no influence on how a name is printed in reports or charts. The need to sort on several given names could be discussed, also the priority of surnames.
Important: For example, a middle name could indicated to be sorted as a given name or surname, but that does not imply that it is classified as a given name or surname in other contexts, and this proposal does not imply anything about any need to classify name parts as middle name, patronymic etc (which there may perhaps not be a need for).
Source:
Page: Person-Name+Elements Discussion: message/view/Person-Name+Elements/30777083 External Gramps page: http://gramps-project.org/wiki/index.php?title=GEPS_021:_Additional_Name_Fields
Way forward?:
A program could offer separate fields for the entry of these parts or use special notation.
Dependencies:

Approval status:

Proposer:
gthorud 25 Feb
Changes:

Discussion:
Discussion at Data-PersonNames01 - Sorting on multiple given names and surnames

Place


Id:
Data-Place01 (was location part of Data03)
Title
Approximately known locations
Description:
BetterGEDCOM must allow the recording of approximately known locations.
Importance:
Mandatory
Why?:
GEDCOM already allows dates to be "about yyyy". Locations may also be equally inexact, e.g. "at sea between England and Australia".
Note - this is not the same as assigning a probability to a value - e.g. "Probably London" is not the same as "Near London" and this requirement is not intended to cover concepts like "Probably London".
See also Data03
Source:
Tom Wetmore's Goal and Requirements plus various discussion pages.
Way forward?:

Dependencies:

Approval status:

Proposer:
AdrianB38
Changes:
2011 Mar 22 AdrianB38 - rename from "Approximate known approximate locations" to "Approximately known locations"
Discussion:


Id:
Data-Place02 (was Data08)
Title
Recording of structured data about locations
Description:
BetterGEDCOM should support the recording of structured, historic data about locations, for example multiple names, default prepositions for names, photos, maps, sources and links for access to geographic information services.
Importance:
Very desirable
Why?
Current GEDCOM does not even recognise "Place" as an entity - there is a rich amount of information about places over time, much of which will affect people.
Source:
"GEDCOM Won't Transfer This" on Shortcomings of GEDCOM
Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposal:

Changes:

Discussion:


Id:
Data-Place03
Title:
A place can be member of several place hierarchies
Description:
BetterGEDCOM should support the recording of places of various types as members of several hierarchies of places (locations), possibly changing hierarchies over time, and possibly with surety assigned to the relation to a higher place – in a way where the path through the hierarchy to the top is unambigously identified for each place name.
Importance:
Very desirable
Why?
Gedcom supports hierarchies of names in events, but does not link these names and hierarchies unambiguously to place entities. This is not sufficient to describe the facts of history related to a place.
Source:
tracking land changes idea” discussion and the Location entity page
Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Data-Place04
Title:
Merging and/or splitting of places/locations
Description:
BetterGEDCOM shall be able to record identifiers of the place(s) that was split and/or merged when a place (location/property/region) was created.
Importance:

Why?:
The origin of a place is an important information about a place, and may in many cases provide evidence about relations between persons.
Source:
message/view/Location+entity/30888227
message/view/Location+entity/30668879?o=40
Way forward?:
The info should preferably be recorder by an event referencing the involved placeS, also giving date and source but possibly no persons.
Dependencies:

Approval status:

Proposer:
22 Feb gthorud
Changes:

Discussion:


Id:
Data-Place05
Title:
Place identifiers
Description:
BetterGEDCOM shall be able to record identifiers, possibly multipart/hierarchical, for a place used for example in land records, map databases, property owner databases, statistics. The identifier type should accompany each identifier part, i.e. a sequence of type/value pairs.
Importance:

Why?:
The identifier can be used to locate and lookup in various paper sources and , and is also in itself a historic fact. An identifier is often unique where a name is not. Several identifiers may have been used over time.
Source:
message/view/Location+entity/30668879?o=40
Way forward?:

Dependencies:

Approval status:

Proposer:
22 Feb gthorud
Changes:

Discussion:


Id:
Data-Place06
Title
Location to include address
Description:
The location in BetterGEDCOM should be able to specify an individual address.
Importance:
Very desirable
Why?
Current GEDCOM5.5 defines a PLACE as a "jurisdictional name to identify the place or location of an event". The address of an individual building is generally not regarded as being a PLACE under this definition. Since many events are known to occur at precise addresses, the address details are kept separately in the ADDRESS_STRUCTURE. This structure, however, repeats items like city, state, country.
To avoid duplication and the consequent danger of values not being correctly duplicated, the successor to PLACE should include the ability to specify an individual address.
Source:
Various discussion pages.
Way forward?:
  • Create this in the Data Model.
  • To be decided - whether a location's details in BetterGEDCOM should include Postal Code or Phone Number, which are also part of ADDRESS_STRUCTURE of GEDCOM 5.5, but appear to have dubious relevance to historical events or characteristics.
  • Note this does not mean that the ADDRESS_STRUCTURE of GEDCOM 5.5 has no future in BetterGEDCOM, since the address of a repository, for instance, does not need to have the same structure as a location for historic events or characteristics.
Dependencies:

Approval status:

Proposer
AdrianB38
Changes:

Discussion:
2011 Mar 03 - Created. Split off from Data-Char02 the requirement that location goes down to address to make it more obvious

"Ship"


Id:
Data-Ship01 (was Data11)
Title
Data about miscellaneous entities
Description:
BetterGEDCOM could support the recording of historic data about miscellaneous entities or artefacts such as ships, locomotive types, etc.
Importance:
Desirable
Why?:
Individuals, organisations, etc., are usually involved with many physical artefacts, yet no mechanism currently exists in GEDCOM to record any of the artefact's details in a structured manner, nor to link these things to people, etc.
Examples could include a summary of the history of a ship used for several cross-Atlantic journeys by different people. These details could be entered in one place, not against each person.
Note this requirement does not say anything about how that data will be represented on the file, specifically it does not say anything about how evidence and conclusions are represented.
Source:
Shortcomings of GEDCOM
Way forward?:
Create this in the Data Model.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:
Discussion of Data-Ship01 Data about miscellaneous entities

DNA


Id:
DNA01
Title:
Results from DNA tests
Description:
BetterGEDCOM should be able to record results of DNA tests.
Importance:

Why?:
Many genealogy programs allow recording of such data.
Source:

Way forward?:

Dependencies:

Approval status:

Proposer:
19 March 2011 GeneJ
Changes:

Discussion:
To do - add sources, repositories, etc.

Evidence


Id:
Evidence01
Title
Evidence & Conclusion Model
Description:
BetterGEDCOM could handle evidence and not just conclusions
Importance:
Desirable
Why?:
Current GEDCOM is structured so that data about an individual or family is always the "latest working hypothesis". It is therefore difficult to identify the actual evidence, particularly when the "latest working hypothesis" is a composite of various bits of evidence.
Also, in the event of discovery of an error, it can be difficult to (a) identify subsequent issues and (b) revert to an acceptable set of "working hypothesis". This is because adding new or revised conclusions to current GEDCOM is generally a destructive process resulting in the replacement or deleting of superseded conclusions.
To overcome this, it appears as a minimum to be necessary to record evidence and conclusions separately. This allows adding new or revised conclusions to be a non-destructive process.
See Evidence and Conclusion Process
Note this requirement is effectively the same as (possibly part) adopting the "Evidence and Conclusion Model", which is linked to, but not the same as, the "Evidence and Conclusion Process". See Glossary
Source:
"I Want My Genealogy Software And BetterGEDCOM To Do This" on Shortcomings of GEDCOM
Way forward?:
  • Establish a first cut at a comprehensive set of genealogical processes that cover both Research Administration and recording of both Evidence & Conclusions.
  • Define which parts of the processes are in the scope of Research Administration and which in that of Evidence & Conclusions
  • Consider how the model and processes support "roll-back" to an acceptable state after discovery of an error.
  • Consider feasibility and therefore the priorities of documenting (a) requirements and (b) the data model relating to Research Administration and Evidence & Conclusions and establish what is do-able in relation to timescales
Dependencies:

Approval status:

Proposer:

Changes:
2011 Feb 22 17:45 CET - attempt to clarify this is about the "Evidence and Conclusion Model", which is linked to, but not the same as, the "Evidence and Conclusion Process".
2011 April 11 17:00 CET - adjust "Way Forward" in light of discussions. Add distinction btw destructive process for adding new stuff in current GEDCOM, conclusion-only, and non-destructive process in Evidence & Conclusion
Discussion:
Evidence01 and Evidence 01 Please use the latter one. See also Defining E&C for BetterGEDCOM

Id:
Evidence02
Title:
Proof Argument and/or Process
Description:
BetterGEDCOM should support users need to record and share proof arguments supporting and/or supported by the evidence and conclusions therein recorded or shared.
Importance:
Very Desirable
Why?:
Supports faithful recording of research status and results.
Source:
http://www.bcgcertification.org/skillbuilders/skbld091.html
Way forward?:

Dependencies:

Approval status:

Proposer:
GeneJ
Changes:
2011 Feb 21 - created
2011 FEb 22 - Fixed URL for link to discussion (GJ)
2011 Feb 22 - Fixed keyboard witch's duplication in the description field above.
Discussion:
message/view/Better+GEDCOM+Requirements+Catalog/34594682

International


Id:
International01
Title
Support for international character sets
Description:
BetterGEDCOM must be able to handle text expressed in most of the world's writing systems
Importance:
Mandatory
Why?:
Genealogy is not confined to countries with the American-English 26 letter alphabet
Source:

Way forward?:
Unicode UTF-8
Dependencies:

Approval status:
See International02
Proposer:

Changes:

Discussion:


Id:
International02
Title
Unicode
Description:
BetterGEDCOM must use Unicode and only Unicode to represent text
Importance:
Mandatory
Why?:
Unicode is the universally accepted solution for handling the multitude of modern, historical and ancient character sets used by all human cultures. UTF-8 is the most common byte encoding of Unicode and supported by all modern software development environments
Source:

Way forward?:
Unicode UTF-8
Dependencies:
International01
Approval status:
Developers Meeting 17 Jan 2011 approved "Use Unicode (only) for the consistent encoding, representation and handling of text expressed in most of the world's writing systems"
This is International01 plus International02 expressed in one sentence.
Developers Meeting 31 Jan 2011 approved "Unicode character set in UTF-8 encoding, and optionally support other encoding schemes of Unicode "
Proposer:

Changes:

Discussion:


Id:
International03
Title
Support for the requirements of many cultures, countries, time periods and belief systems
Description:
BetterGEDCOM must support recording of information about real life in an open-ended set of cultures, countries, time periods and belief systems. It must not be biased towards any one of these.
Importance:
Mandatory
Why?:
BG must support (directly or indirectly) different calendars, events from different religions and cultures, etc.
Source:
Discussion topic "Goal 5 (Internationalization)"
Way forward?:
The BetterGEDCOM project cannot possibly understand all possible calendars, religions, etc. Therefore while we may be able to directly support the best known of them, we will have to cater for the rest indirectly by allowing software companies or users to extend BG to cope with them.
Dependencies:
This depends on Syntax04 and Syntax05 re extensibility
Approval status:
The 31st Jan 2011 Developers Meeting passed this:
"Goal 5 BetterGEDCOM supports recording of information about real life in an open-ended set of cultures, countries, time periods and belief systems. It should not be biased towards any one of these."
Proposer:

Changes:

Discussion:


Multimedia


Id:
Multimedia01 (was Syntax02)
Title
Multimedia container
Description:
BetterGEDCOM must use a container specification to hold separate supporting files such as multimedia accompanying the genealogical data.With Multimedia we mean digital resources that may represent photos, scanned images, video, sound, documents, web pages, diagrams, maps, (database?) etc.
Importance:
Mandatory
Why?:
1. Embedded files within the genealogical data are generally viewed as a bad idea - they would have been rejected by GEDCOM in the next version after 5.5.
2. A weakness of current GEDCOM is that there is no standard method of transferring linked multimedia with the GEDCOM file, nor of maintaining the links to them after transfer.
Source:
Original Goal 2 bullet 3 Multimedia inclusion and referencing issues Importing Data
Way forward?:
Zip is probably in there somewhere
Dependencies:

Approval status:
Developers Meeting 17 Jan 2011 approved this
Proposal:

Changes:

Discussion:


Id:
Multimedia02
Title:
Information about multimedia objects
Description:
BetterGEDCOM must support the recording of information describing each multimedia object. Possible types of information include object encoding type (MIME?), origin/creator/author/publisher, (file) size, title, description, caption, creation time, identification of e.g. persons shown, type of “objects” shown in media (e.g. persons, landscapes, houses), copyright, informal/short identifier/name, setting (type of circumstances/event when created), user defined attributes and attribute types/flags, quality classification, creating program name&version, tags (incl. geo tags), research notes, duration – and more – or less.
Importance:
Mandatory
Why?:
This information is needed to select, organise and manipulate multimedia objects in genealogy programs and to provide information about the object when included in e.g. reports.
Source:
Multimedia inclusion and referencing issues
Way forward?:
The various types of information could be split into new requirements. The information should be held in an entity and top level record, possibly by supporting structures.
Dependencies:

Approval status:

Proposer:
March 4 2011 gthorud
Changes:

Discussion:


Id:
Multimedia03
Title:
References to Multimedia
Description:
BetterCEDCOM should allow information recorded about persons, families, groups, places, sources, events etc. to reference multimedia objects. The reference could contain information about the media's relevance in the referencing context. It would also be useful if classify the media in the referring context, eg. if the media is a preferred media or one or more classifications that could eg. be used to affect it's location in reports.
Importance:

Why?:
Information about the relevance in the referencing context could say for example "This is a photo of Peter together with his classmates in 1955". It could overrule similar information recorded about the photo for general use. The classification could allow some media to be printed above the text about a person and other media below, or in a scrapbook etc. but could also be used for other purposes - this is useful when transferred between one user's programs.
Source:
Multimedia inclusion and referencing issues
Way forward?:
The reference should most likely be to a multimedia entity/record containing information about the multimedia, see Multimedia02. It must be possible to reference multimedia in notes and excerpts.
Dependencies:

Approval status:

Proposer:
March 6 2011 gthorud.
Changes:

Discussion:


Id:
Multimedia04
Title:
Grouping of multimedia in a container
Description:
A container (see Multimedia01) shall be able to group the media in a tree structure possibly reflecting the directory structure on the exporting program's computer.
Importance:

Why?:
The structure is most likely useful to the receiver of the media.
Source:

Way forward?:

Dependencies:

Approval status:

Proposer:
6 March 2011 gthorud
Changes:

Discussion:


Source


Id:
Source01
Title:
Information, Source and Evidence Type
Description:
BetterGEDCOM should record separately whether a Source is, for a given event or characteristic:
  • Primary or Secondary Information (latter includes tertiary)
  • Original or derivative source (e.g. paper or copy/digital image; document or compiled summary; document or transcribed version)
  • Direct, indirect or negative evidence
Importance:
Very Desirable
Why?:
GEDCOM only has QUAY (quality) for this; QUAY is not a substitute for the specifics, as herein described.
Source:
Discussion page on Shortcomings of GEDCOM
Way forward?:
Include data items
Dependencies:

Approval status:

Proposer:
AdrianB38
Changes:
11 Mar 2011: Added Title (GJ)
Discussion:
Source01

Id:
Source02
Title
Certainty Assessment (QUAY)
Description:
BetterGEDCOM should record the qualitative degree of likelihood that a source is true for a given event or characteristic.
Importance:
Very Desirable
Why?:
GEDCOM has QUAY (quality) for this but the GEDCOM Standard is not clear what QUAY value should be assigned to a Primary source of Questionable accuracy
Source:
Discussion page on Shortcomings of GEDCOM
Way forward?:
Include data items
Dependencies:

Approval status:

Proposer:

Changes:
12 Mar 2011: Added Title (GJ)
Discussion:
Source 02-Certainty Assessment (QUAY)

Id:
Source03
Title
Sourcing of child / parent relationships
Description:
BetterGEDCOM must provide the ability to record the sources and citations to justify why a child is believed to be in a particular relationship with its (birth or whatever) parents
Importance:
Mandatory
Why?:
GEDCOM has no ability to do this. The current citations and sources are either for a family as a whole or for individual birth (or whatever) events that only mention the child.
Source:
GEDCOM Messes This Up on Shortcomings of GEDCOM
Way forward?:
Include data items
Note the way forward may vary depending on the solutions chosen for Data-Fam01 and Data-Ind02 "Biological relations independent of family"
Dependencies:

Approval status:

Proposer:
AdrianB38
Changes:

Discussion:


Id:
Source04
Title:
Length of citations
Description:
There must be no limit in BetterGEDCOM on the length of a citation, whether that citation applies to a source (often expressed as part of a bibliography entry) or an event, attribute, person, relationship, etc, etc (often expressed as a footnote or end-note).
Importance:
Mandatory
Why?:
The majority of citations will be short. However, some users may wish to record a Proof Argument inside the citation. Any limit on the length of such a citation would be arbitrary and could be exceeded, so should not be permitted. See also requirement Syntax10 "No restrictions on item length or value", which is a generalised version of this requirement.
Source:
See discussion of "The Missing Link - a new entity type or a new type of source?" and specifically the discussion of the options for citations in there.
Way forward?:
While many users would never wish to use lengthy citations, there seems no good reason to forbid their use.
Dependencies:

Approval status:

Proposer:
AdrianB38
Changes:
Created 2001 April 17 15:50 CET
Discussion:
Discussion

Id:
Source05
Title:
Citations in notes
Description:
BetterGEDCOM should allow citations to be entered anywhere in in the text of notes.
Importance:

Why?:
For the same reason as footnotes are used in many texts to cite sources.
Source:

Way forward?:
One way to do it is to have separate records for citations.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Support for the standard


Id:
Support01
Title:
Support for multiple birth, death characteristics
Description:
Programs claiming support for BetterGEDCOM must support multiple birth, death characteristics for a person. Support means that the program must be able display the facts related to several occurencies of the characteristic, and allow recording of several such by the user.
Importance:

Why?:
See Char03
Source:

Way forward?:
The same requirement should be considered for other basic characteristics.
Dependencies:

Approval status:

Proposer:
6 March 2011 gthorud
Changes:

Discussion:


Syntax


Id:
Syntax01
Title
Underlying syntax
Description:
BetterGEDCOM's underlying syntax must be an existing, non-proprietary syntax
Importance:
Mandatory
Why?:
We do not want to reinvent the wheel
Source:
Original Goal 2 bullet 1
Way forward?:
Options include XML, JSON, GEDCOM, Google Protocol Buffers or any combination thereof.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:

Syntax02 has been moved to Multimedia01

Id:
Syntax03
Title
Content scope
Description:
The BetterGEDCOM file format must define data relating to the study of genealogy / family history.
Importance:
Mandatory
Why?:
Raison d'etre of the format - statement of the obvious. The coverage of BetterGEDCOM must be wider than existing formats in order to provide a reason for its adoption.
Source:
Original Goal 3
Way forward?:
Define the data in a Data Model etc.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Syntax04
Title
Extensibility by software companies
Description:
The BetterGEDCOM file format must be capable of extension by software companies. Extensions must be kept permanently separate from any later definitions in BetterGEDCOM format.
Importance:
Mandatory
Why?:
1. GEDCOM can be extended so to remove the facility would be a step backwards.
2. Many GEDCOM files exist with extensions.
Source:
Original Goal 3
Way forward?:
Note that extensions in GEDCOM are identified by an underscore, which applies only to extensions. Any new GEDCOM tags will not have the underscore so will not be confused with extensions. An equivalent mechanism needs to be used for BetterGEDCOM.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Syntax05
Title
User Extensibility of events and characteristics
Description:
The list of events, properties, characteristics, etc, of individuals, etc, in the BetterGEDCOM file format must be capable of extension by users. Extensions must be kept permanently separate from any later definitions in BetterGEDCOM format.
Importance:
Mandatory
Why?:
1. GEDCOM can be extended so to remove the facility would be a step backwards.
2. Many GEDCOM files exist with user-defined events.
Source:
Original Goal 3
Way forward?:
Note that user defined events and attributes in GEDCOM are identified by an underscore, which applies only to them. Any new GEDCOM tags will not have the underscore so will not be confused with user defined events, etc. An equivalent mechanism needs to be used for BetterGEDCOM.
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
Syntax06
Title:
Define one way of doing a thing
Description:
BetterGEDCOM should define just one way of doing one thing.
Importance:
Very Desirable
Why?:
More than one way may cause ambiguity and extra programming for programmers
Source:
Original Goal 7
Way forward?:
It may be sensible to agree specific exclusions to this requirement, e.g. for in-line notes and separate note records, where the extra programming work is trivial and does not create ambiguity.
Dependencies:
Issue 1: It is not always possible to agree that two things are, in reality, the same thing. For instance, whether or not in-line notes and separate note-records are, in practical terms, the same thing, has been the topic of debate.
Issue 2: If two separate methods in GEDCOM type formats are merged into one, then it will not be possible to round-trip data from a GEDCOM type format to BG and back again coming up with the same data.
Approval status:

Proposer:

Changes:
2011 Feb 22 - Updated template format to add rows for title, proposer and discussion; added title, added link to discussion (also added discussion topic) (GJ)
Discussion:
Discussion at Syntax06 - Define one way of doing a thing
Former discussion at Single way (current goal 7) [Please do not add comments to the former discussion]

Id:
Syntax07
Title
URIs (URLs) for external information
Description:
BetterGEDCOM format files must be able to contain URI (URL) addresses for external information
Importance:
Mandatory
Why?:
It is necessary for users to record to sources, etc on the Internet. Part of that data will be the URL.
Source:
Tom Wetmore's Goal and Requirements
Way forward?:

Dependencies:

Approval status:

Proposal:

Changes:

Discussion:
Syntax07 URIs (URLs) for external information

Id:
Syntax08
Title
Feature inheritance from previous event etc. types
Description:
It should be possible for user-defined events, properties, characteristics, etc, of individuals, etc, to inherit features from previously defined events, properties, characteristics, etc.
Importance:
Very desirable
Why?:
Events, properties, characteristics, etc. known to the application software may have logic built into the application to recognise them and process the data from them in certain ways.
For instance, the "Marriage" event might be used by the application to propose a family to the user.
User-defined events, properties, characteristics, etc., will not normally be recognised by the application so cannot have logic built into the application to recognise them. However, if the user-defined event, property, characteristic, etc., could inherit features belonging to one known to the application, then it would inherit that built-in logic.
For instance, "Marriage - civil" might be a user-defined event that inherits details from "Marriage" and so would also be used by the application to propose a family to the user.
Source:
"I Want My Genealogy Software And BetterGEDCOM To Do This" on Shortcomings of GEDCOM
Way forward?:
If events etc are given a type and sub-type, then it would be possible for the user to create a user-defined subtype of an application defined type, and thus inherit the processing done for that type.
For instance, an event "Marriage - civil" might have a type of "Marriage" and a subtype of "civil", thus automatically doing all processing created for the event-type of "Marriage"
Dependencies:
Syntax05
We depend on the application developers to create any processing that recognises events.
Approval status:

Proposer:

Changes:

Discussion:


Id:
Syntax09
Title:
Define Event vs. Attribute
Description:
Assuming that the BetterGEDCOM project distinguishes events from properties / facts / attributes / characteristics, then BetterGEDCOM must define and publish a clear definition of the difference between the two concepts that does not rely on a list of each. In particular, the definition must be clear enough for competent software suppliers and users to understand whether a new item is an event or a property / fact / attribute / characteristic.
Importance:
Mandatory
Why?:
There is no clear definition in the GEDCOM 5.5 specification of the difference between the two, only a list of events and a list of attributes. This means that a software supplier or user does not always know whether to create an event or attribute. As a result, the same concept can appear as both, resulting in difficulty of exchange of information.
Source:
Discussion on Custom GEDCOM tags Discussion: Eliminate Facts Discussion: Events, Properties, Characteristics and Facts
Way forward?:
If and when it becomes necessary to distinguish the two concepts, then the Data Model should be updated to record the definition.
Dependencies:

Approval status:

Proposer:
AdrianB38 2011 Feb 25 22:35
Changes:

Discussion:
Syntax09 Define Event vs. Attribute

Id:
Syntax10
Title
No restrictions on item length or value
Description:
Data items should have no length restriction in BetterGEDCOM, except as deemed necessary during design.
Data items should have no restrictions on value in BetterGEDCOM, except as deemed necessary during design.
Importance:
Very Desirable
Why?:

Source:
Original Goal 2 bullet 5
Also Tom Wetmore's Goal and Requirements
Way forward?:
Compare TextHandling02 "No restrictions on line length", which refers to the overall length of a line.
Dependencies:

Approval status:
Subject of Survey Monkey - relevance? result?
Proposer:

Changes:

Discussion:


Id:
Syntax11
Title:
Unique Identifiers
Description:
BetterGEDCOM should assign unique identifiers (UIDs) to records, BG-files and "data sets". Data sets (the term could be changed) is a collection of data that may hold infomation about e.g. "The Olsen family", "Persons in parish X" or "Our genealogy project" that will be updated over time, and be exported in a BG file at (i)regular intervals. The data set will have a unique identifier, and so will each BGfile containing a snapshot of the data set.
Importance:

Why?:
The various purposes that UIDs could serve must be more pricisely defined. Also the procedures for their assignement and their use.
Source:
This has been discussed in Data08 and UUIDs - No thanks and Please lets use UUIDS ... and several other discussions (search for UUID).
Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Task01 has been moved to Admin02.

Test


Id:
TestSuite01
Title:
Suite of test data
Description:
BetterGEDCOM should provide a test suite of data that will
  • allow software suppliers to assess compliance of their software
  • help them to diagnose issues
  • assist them to resolve issues.
Importance:
Very Desirable
Why?:
If we can't do it, others will - and probably get it wrong. This will also meet developers halfway.
Source:
Original Goal 4 (I need to check up subsequent discussions on this)
Way forward?:

Dependencies:

Approval status:

Proposer:

Changes:

Discussion:
Test Data Format

Text Handling


Id:
TextHandling01
Title
Formatting mark-up for text
Description:
BetterGEDCOM should define a method of marking up text with formatting information. It should be available in all appropriate fields
Importance:
Very Desirable
Why?:
This is a consistent request - the ability to format notes with italics, bold, etc.
Source:
Original Goal 2 bullet 4
Way forward?:
Allowing selected HTML or HTML-style tags?
Dependencies:

Approval status:

Proposer:

Changes:

Discussion:


Id:
TextHandling02
Title:
No restriction on line length
Description:
Lines should have no length restriction in BetterGEDCOM, except as deemed necessary during design.
Importance:
Very Desirable
Why?:

Source:
Original Goal 2 bullet 5
Also Tom Wetmore's Goal and Requirements
Way forward?:
Compare Syntax10 (was TextHandling03) "No restrictions on item length", which refers to the length of an individual item.
Dependencies:

Approval status:
Subject of Survey Monkey - result?
Proposer:

Changes:

Discussion:


Id:
TextHandling03
Title:
Footnotes/endnotes in notes
Description:
BetterGEDCOM should allow references to footnotes or endnotes that contains just text (not a source citation).
Importance:
Very Desirable
Why?:
Such footnotes/endnotes could contain comments or other text that may not be considered important enough to be entered in the note itself. See also Citations in Notes - Source05.
Source:

Way forward?:
The text could surrounded by special codes in the note text, or be contained in a separate structure.
Dependencies:

Approval status:

Proposer:
17 April 2011 gthorud
Changes:

Discussion:


Id:
TextHandling04
Title:
Semantic Mark-up in Text
Description:
TextHandling01 describes 'presentational mark-up' in text. This item supplements that with 'semantic markup' in text. This allows references to other entities (e.g. Persons, Places, Events, etc) to be embedded in text. Reference should be made to STEMMA's research document on Structured Narrative which discusses the need for both types of mark-up, plus citation references, and general reference notes.
Importance:
Very Desirable
Why?:
Semantic mark-up provides machine-readable references that can be used for automatic linking, and generation of hyperlinks for the UI
Source:
Structured Narrative
Way forward?:
Need to widen the scope of "Notes" to include general narrative, and hence "Structured Narrative". This is a neglected area that STEMMA is pushing alone. It is essential for the representation of family history data as opposed to mere genealogical data
Dependencies:

Approval status:

Proposer:
23 May 2012 ACProctor
Changes:

Discussion:


Timelines


Id:
Timeline01
Title:
Timelines
Description:
This is just a placeholder so far.
Importance:

Why?:

Source:

Way forward?:

Dependencies:

Approval status:

Proposer:
20 March 2011 gthorud
Changes:

Discussion:


Snapshot of the page:
Better GEDCOM Requirements Catalog snapshot 2may2011.pdf

Comments

theKiwi 2011-02-09T20:07:34-08:00
Programme?
From the page

"Clients, Customers, Stakeholders & Users
(This section is here simply to make you think)
Currently we have no identified Client paying for the programme."

I'm reading "programme" here in the sense of code that runs on a computer to accomplish some task - a browser is a programme, Adobe Photoshop is a programme. In that sense, BetterGEDCOM is not going to be a programme. There is never going to be a BetterGEDCOM "programme" is there?

Or is "programme" referring to the planned process by which BetterGEDCOM is to be developed?
ttwetmore 2011-02-09T23:45:15-08:00

I read it as the British spelling of "program" with the definition of "a set of related measures, events, or activities with a particular long-term aim."

Tom
ttwetmore 2011-02-10T05:36:40-08:00

Synonyms for "programme" in this context could be "effort" or "project". My preference would be "project".

Tom
AdrianB38 2011-02-10T09:31:02-08:00
Oh dear - I thought I was safe with the word "programme", since I thought you all knew that even us old-fashioned Brits spell the word for "set of computer instructions" as "program".

My intention was that "programme" meant a series of work activities. I used that word since every bone in my body rebels at using the word "project" for this. It isn't a project - at least not yet - it doesn't have a defined beginning and end, and it certainly doesn't have project management! <grin>

However, I think I'm going to get more people asking "why / what programme?" than I am project management gurus complaining about the misuse of the word "project", so I'll swap to "project".
theKiwi 2011-02-10T18:29:45-08:00
I guess I'm out of the loop then - I'm a Kiwi, so quite at home with the spelling of programme, but having lived in the US for 17 years (which is all but 4 of my Macintosh owning years), didn't realise that in English English, program and programme now co-existed with different meanings.
AdrianB38 2011-02-11T02:43:37-08:00
"Divided by a common language" is the phrase, I believe <grin>

Hope no-one starts mentioning "catalogue"!
theKiwi 2011-02-11T04:44:27-08:00
Adrian - yes, apparently <g>
brianjd 2011-02-21T07:37:03-08:00
Catalog and catalogue, have the same meaning in standard American English.
As do program and programme.

They also, have numerous meanings in American English. The longer spellings are not in common use, but still considered correct.

That is really the root of the problem, using words that can be ambiguous, or words that have changed in meaning in recent history. Most Americans would not know for example that the British have separated out the alternate spellings of a word into to new words with separate and distinct meanings. ;')
testuser42 2011-02-15T16:50:34-08:00
(No) more requirements?
Since you asked for more requirements, I looked at my old write-up [[My+dream+genealogy+software]]
I feel like you've got everything there that concerns the BG part of the equation covered. Well done! But maybe you can still find something - you seem to know how to do this.
testuser42 2011-02-15T17:04:54-08:00
re Evidence01
For the record - I'm a fan of supporting the evidence-conclusion process. Having this capability in BG as soon as possible would be a huge boost IMHO. Since no programs (that we know of) support it yet, we have a chance to define the structures that will record the evidence-conclusion process. We might actually need to do this so that programmers know what we are talking about, and can start working on implementations.

But I don't know how difficult it might be to turn theory into reality. The example you give (rolling back) is a good one. I believe it should be possible by severing the links between records - but I'm not a programmer, and wouldn't know how to do it in reality. Maybe Tom's DeadEnds software does it already?
AdrianB38 2011-02-17T12:59:06-08:00
My problem with modelling the evidence-conclusion process is that I don't believe we, as genealogists, truly yet understand what it is.

The GenTech model allegedly covers a lot of this - as I've seen in introductions to it - but frankly I don't understand how what's in their model relates to what I'd read the those introductions (which is where I got the idea of roll-back from).

Logically, the evidence-conclusion process starts with inputs, and finishes with outputs, where some of the outputs are
- the new (interim)-conclusions,
- a documented version of the logic that was gone through to reach those conclusions.

My issue is with the inputs, which _might_ include:
- previous interim conclusions;
- data from sources;
- assertions that we believe to be true (e.g. "Coverage of UK births in civil registration is complete enough for the possibility of a missing birth to be probabilistically insignificant")
- control data (e.g. dates of censuses)

Miss out any of those and the documentation of your evidence-conclusion process becomes an untruth. The process is therefore inherently risky and we certainly need to get our heads round the process before we can define what the inputs are in a clear and helpful manner. Chicken and egg.

The only sensible way of doing it that I've seen suggested so far is Tom's idea of generating conclusions from evidence because that avoids the human element. Yet that is so far from what the rest of us are doing....
brianjd 2011-02-21T08:05:19-08:00
I'll chime in here, even though I have no stake in subject. I don't see where it becomes the job of the standard to govern, control or dictate how evidence-conclusion works.

Taking out the human element is definitely wrong. "What is Toronto?", is all the evidence you should need to prove that. While computers can be very good at making decisions and doing lots of things. Drawing conclusions that depend on a deep understanding of Human language is NOT one of them. Yet.
While it is certainly true that giving the opportunity, humans will inevitably make bad evidence-conclusion relationships. People will do what they will to prove the conclusions they are pre-disposed to make. Our task is to give them a format that will allow them to make good ones.

If you build a standard to be a bigger-better-idiot-proof standard, God/nature will simply build a bigger-bigger-better idiot. Ad infinitum.

We must design for that 80% population, that require 20% of the work. That last 20% is a killer. For example to reach 90% it would require a 250% increase in work. To attain 95% would require approximately 350% as much work. Theoretically speaking. In actuality it is not given that any attempt to reach 100% on any given problem is actually achievable and that effort does not grow geometrically, rather than logarithmically. Sorry to introduce the math/s (word depending on your native language American/Everyone else).

Still wish there was a preview option.
GeneJ 2011-02-21T09:01:24-08:00
@Adrian,

At least for me, the Genealogical Proof Standard (together with Mills' Research Process Map) explains the evidence-conclusion process.

I am not a fan of populating the database proper with evidence bits and person bits. I would just love it if those concepts could populate a research log. (So that conclusions could be entered to a database directly or also associated with a process documented in the research log.) --GJ
AdrianB38 2011-02-22T09:14:03-08:00
Brian - "I don't see where it becomes the job of the standard to govern, control or dictate how evidence-conclusion works"
I agree - that's why I don't want to do anything to inhibit better recording of it in BG, but can't see yet what we can do in BG.
I am interested in what machines can do with the data to turn it into information / conclusions - I just don't see it being more than marginally useful in the forseeable future for all the reasons you suggest.
AdrianB38 2011-02-22T09:23:10-08:00
Gene - I'm perfectly happy with the GPS and ESM's Research Process Map being an explanation of the evidence-conclusion _process_.

This Requirement is meant to be about recording the intermediate bits of data in the Data Model. Slightly too late I remember I put it into the Glossary as "Evidence and Conclusion Model" separate from "Evidence and Conclusion Process"

Your comment about a research log sounds very interesting - it isn't that far from the stuff in the "Evidence and Conclusion Model". Much as I like it, I hesitate before adding a Research Log as a requirement, else it'll be a never-ending list and it may be better to keep this to applications more specific to to-do handling, etc. But I will think about it...
brianjd 2011-02-22T13:25:17-08:00
Adrian,

I think we could easily add a minimal ability to have a research log without burdening the standard.

All we really need is a place to put research notes. A single tag should do it. We could call it Research Note, or Research Task, or it could be a particular type of a Task tag.

Entity: Task
Type: Research
Fields: Description, researcher, date started, date completed, isComplete, status [proved/disproved/inconclusive],owning record...


I created a Task entity last night, thinking along these lines. I made is very desirable, IIRC.

It could be used for much more than just a research log. A good way to track all the work done. Handily in one place, for easy retrieval, and to pass along to the next researcher. Good to have by your side in a fight. ;')
GeneJ 2011-02-22T13:31:59-08:00
:)
AdrianB38 2011-02-17T09:24:00-08:00
Split Data group
At Geir's suggestion, I have split and renumbered the Data group of requirements, which may otherwise become unwieldy.

Note the use of a prefix does NOT imply anything about there being a requirement to design something in a specific way. In particular, although I have used prefixes Data-Group and Data-Family, that does NOT mean there is a requirement that Family and Group should be treated separately.

The prefixes are only for convenience.
gthorud 2011-02-17T14:40:36-08:00
Stakeholders
I have added the following sentence to the beginning of the catalogue:

"Since data may be transferred from a BetterGedcom file into an application, and then to a genealogy service provider via an API in the application, providers of such network services are likely to be affected by the structures in BG, and will therefore seek to influence or control BetterGedcom."

It should be considered in light of recent events.
brianjd 2011-02-21T08:26:15-08:00
What recent events are you speaking of?
brianjd 2011-02-21T08:32:17-08:00
Scope
I think that our scope should include a compliance testing application, justs as the W3C has for validating HTML, XHTML, etc.

We are developing what we hope to be the next genealogy data format standard. It is incumbent on us to provide at least the first validation application.
AdrianB38 2011-02-22T13:36:30-08:00
I'm not sure if the BetterGEDCOM project has the resources to create a validation application. That's why I didn't put it in. Not having one is certainly a risk - just look at the @@@@ created in the name of GEDCOM.

Various people thought the only practical way was to create some "reference files" in the new format and so approach it from the other end, i.e. can your application read a complex, valid BG file?
brianjd 2011-02-22T20:59:25-08:00
I have offered, repeatedly, to write the application for the group and release it under a OSS license. GPL version 3 would be my choice for work I'm giving to a group. GPL v2 would be my choice for my own work. I think GPL v3 would be best for the group. For legal reasons.
ttwetmore 2011-02-23T02:21:28-08:00
When Better GEDCOM reaches the point of having formats to test, I will be tracking that progress with software that reads data in the BG format. Like Brian's offer, this software could be used as the basis of a validation program. My only proviso is that since my retirement, I only develop in Objective-C using the Foundation and Cocoa frameworks for the Mac OS X. Retirement == freedom forever from the Microsoft hegemony.
AdrianB38 2011-02-23T02:24:11-08:00
Brian - that would sound a good idea. Can I suggest you add it to the Requirements Catalog with reasoning and way forward as you have written above? Make it should or could depending on how much obligation you want to put on yourself. It would certainly offer a better means of testing compliance.

Then one of us can alter the scope to match.
brianjd 2011-02-21T09:02:31-08:00
Data-Date02 -modified
I made the Data-Date02 Mandatory from desirable. The reasoning is that any standard that we develop needs to accommodate every common calendar. At the least, we need to support Julian, French, Gregorian, and any common format used on records from 1400 on. And we should explicitly allow all. It is the task of coders to support what they will, but we MUST allow all, or at least use the 80/20 rule.
gthorud 2011-02-22T10:40:33-08:00
Adrian

Well, if I deleted the text, it was not intended.

In 90% of the cases, all dates in a file will be using the same calendar, so if that calendar is identified in the file, I don’t understand the need to see the date encoded in text in order to identify the calendar. But I have proposed to have a sort date accompanied by a string, so what you want is possible even with a numerically encoded date.

The issue with errors is something that is handled by applications.

“Data Model could include a facility for a default calendar, etc, or this could be left to data entry or conversion in applications” I have no idea about what the last halve of this statement means.

The order of day, month and year is not dependent on the calendar, there is a huge number of ways to write a date in the Gregorian calendar.
GeneJ 2011-02-22T11:00:15-08:00
Myrt has given some great lectures on dating issues.

I have only a handful of date entries (mostly my Norwegian baptisms) that are not either modern or limited location O/S dates.

To complicate the issue of my experience, I use a program that has fields for both "date" and "sort date."

I am not familiar with all the genealogical programs on the market, but do recall GENBOX a calendar like step when I entered an O/S date to that program.

Would sure like to see comment on this discussion from wide range of users. --GJ
AdrianB38 2011-02-22T13:21:00-08:00
Geir - a "sort date accompanied by a string, so what you want is possible even with a numerically encoded date". That sounds a useful way forward.

"order of day, month and year is not dependent on the calendar, there is a huge number of ways to write a date in the Gregorian calendar" - good point. The description does include "This [calendar] definition should include the ordering of the date items within the date" but perhaps I will tweak it to say "This [calendar] definition should be accompanied by a definition of the ordering of the date items within the date" to make it clear that these are 2 separate concepts.

"Data Model could include a facility for a default calendar, etc, or this could be left to data entry or conversion in applications" Yes.... It's not wholly clear on reflection.

Let me see if I can get it better - I'm trying to allay fears that people may have about suddenly going into their family history files and needing to add a calendar "code" to all their dates. This can be done by explicitly adding a default into the BetterGEDCOM file itself or by getting the application to update every date automatically. This latter would be done when converting from GEDCOM-compatible to BG-compatible data. I'm probably getting too deep into solutions here so I think I'll put something like
"To be decided: whether Data Model includes facility for a default calendar, etc, or whether every date must be marked up by calendar code, etc. Intelligent application design should reduce the workload for the user in either case."

PS - Geir - a minute or two ago, you were editing at the same time as me. Wikispaces told me what it thought your changes were and I think it has managed to include both our edits.

When I went back in to change a spelling mistake it offered me an unsaved draft to work from. I've no idea what was in that draft but wonder if it might have contained only my changes - or only yours - so such oddities might explain why Wikispaces' idea of your amendments might not match your memory. At any rate, I prefer to work on this before the US wakes up to reduce the risk of double changes.
AdrianB38 2011-02-22T13:29:51-08:00
Gene - we may need to brush up on Myrt's lectures! So far as I know, all Europe is now on the Gregorian calendar (willing to be proven wrong there...) So it seems logical to have one calendar code for that.

FYI - Before 1752, the calendar for England used the Julian calendar AND the year started on Lady Day, 25 March. I have an idea that Scotland used the same Julian calendar but started the year on 1 January. Indeed, there are English parish registers that sometimes started the year on 1 January! So how many codes do we need? Gulp!
GeneJ 2011-02-22T13:39:23-08:00

And .. if we default to the Gregorian, then would you only need a code if you are entering the old style date (vs the indirect Gregorian equivalent).

Isn't that rather like existing GEDCOM, only expanded? (For some reason, I thought existing GEDCOM recognized the OS command following an old style date entry.)

P.S. Myrt's lecture is just packed with info. As to just that "1752 change" though, see Wiki for Gregorian Calendar. In particular, "Pope Gregory XIII, after whom the calendar was named, by a decree signed on 24 February 1582, a papal bull known by its opening words Inter gravissimas.[4] The reformed calendar was adopted later that year by a handful of countries, with other countries adopting it over the following centuries." [With emphasis on the word CENTURIES.]
brianjd 2011-02-22T21:23:19-08:00
On the default date and date format subject.

There is no need for worry on this. Everyone has already chosen or let alone the pre-chosen date format.

Possibly the application the user is using has it set. If not I guarantee the Operating System does.
The obvious solution is to simply presume the OS default unless there is a explicit override. Naturally, this will cause no end of problems, because there will be plenty of cases where the dates are in a format other than the expected. But there is no solution we could come up with that won't cause issues. we just have to accept that.
AdrianB38 2011-02-23T12:23:05-08:00
Brian, certainly no problem with there being a default on the PC, though while the O/S may well have set a calendar and date format, we need it out of there and into the file when we transmit it. No particular problem with that, of course.

And it might be the wrong format, of course. Ubuntu may well be wonderful but I somehow doubt it carries a calendar to give dates like 1 Jan 1666 OS / 1667 NS <grin>

(Gene - I enter these OS/NS dates into Family Historian quite happily so GEDCOM may well take them by default)
gthorud 2011-02-23T16:28:10-08:00
Brian,

There are problems with editing the page, the only solution I see is to save frequently - and cancel your edits since last save if you get a message that someone else is editing. Saved drafts - I am not sure - but tend to avoid them.

Isn't it possible to add a Country code in addition to the calendar value, or are there several dates for transition to Gregorian within a country - well Scotland - but maybe there is some ISO standard with a code for Scotland.
ACProctor 2011-11-30T08:18:40-08:00
I agree that Date values must have an associated Calendar, although there should be a default calendar for, say, Gregorian - or whatever your most common Date value is relative to.

I have worked in the area of "globalisation" for many years and I feel strongly that BG should not get hung up on date-ordering issues, or use any day-names or month-names in date values - both of which would create a locale dependency. ISO 8601 was created as a Date standard for exchange and storage of date values, and the numeric format in particular (yyyy-mm-dd) is ideally suited to locale-neutral data values such as in XML.

All issues of date-entry and date-formatting are for the software loading or generating a BG file. The file itself must remain independent of those processes. Date-parsing should not be a consideration for BG as a locale-neutral and culture-neutral textual data format.
WesleyJohnston 2011-11-30T23:39:23-08:00
I am seeing again an issue that I have seen in a number of discussions. There are two very important perpectives that I believe have to be kept in awareness for BetterGEDCOM.

The first perspective is that of the researcher using BetterGEDCOM to create from sources a database that reflects the evidence and conclusions the researcher has used. This perspective includes both the storage of that database and the sharing of it, either with others or with merging with other databases of one's own. The vast majority of discussion posts come from this perspective.

The second perspective is of a repository that seeks to make available their records in BetterGEDCOM format. This is the perspective that I believe is being left out of most of the discussions.

And in this discussion of Date02, the second perspective is very important to retain in our awareness as we develop a standard. I think it is retained in the "Why?" statement of the requirement: "Dates may occur in source documents in all sorts of calendar representations. It is desirable that the codified representation of that should differ as little as possible from the written characters in the source, to reduce the scope for error in input or output."

If I am a curator of a repository, seeking to offer my repository information online in BetterGEDCOM format, then I do not want to be forced to convert dates in the source material into standard modern Gregorian dates (or any other dates either). If a record says that the event happened on some ecclesiastically named day of a year which is also likely to be an ecclesiastical year (e.g. not beginning on January 1), then I want to record the date as it is written.

I see an issue here very similar to place hierarchies: there is a need for standardized accepted databases of place hierarchies over time, and there is a need for standardized accepted conversion software to convert dates from all the different standards used into modern standard Gregorian dates, so that comparison of dates can be done on an apples to apples basis.

In both cases, this means some standards organization being responsible for the creation of those standards.

BetterGEDCOM must, in my view, support robust apples to apples comparison (which inherently includes ordering) of dates. This does mean parsing of dates and not merely carrying them as text strings. Certainly the early versions of BetterGEDCOM can cut corners on this, for both dates and places. But we should not lose the long-term vision of having these standard place and date aspects robustly supported, which does mean that ultimately someone has to do a whole lot of work to create and maintain those standards for places and dates.
ACProctor 2011-12-01T09:42:48-08:00
I'm not sure I catch your drift Wesley.

Thinking just of Gregorian dates for a second, there might be 3 versions of a date in a typical case: the image or original document in which there is a written date, the transcribed date string, and the date value which is the interpreted version of that date. It is the latter of these - the machine-readable, searchable, sortable date value - that I am talking of. I appreciate that the other versions will need support but the date value must be unambiguous (assuming you can decipher the written version) and ISO 8601 is designed specifically for that purpose.

As you rightly point out, this is similar to place hierarchies in that you may have the original, a transcribed version (incl. any spelling errors and informality) , and the normalised machine-readable, searchable, sortable version.

Any scheme that has to depend on some magic date parser is doomed to failure because no such beast exists. You might be able to get one that works for the US, or for the UK, but a globalised bullet-proof on e is just not possible and I wouldn't trust any software that claim it can do it because I know there is no well-defined grammar and in the worst case the date is ambiguous.
WesleyJohnston 2011-12-01T10:34:15-08:00
re ACProctor

The different dates must be retained, and I see that your post was in regard to the unambiguous interpreted date. And it is good that you have raised the ISO 8601 standard.

I do think we need to have both a long-term view of bringing about just such a "magic" date parser, which is never going to be bullet-proof but probably could handle the vast majority of what it sees.

The fact that the reality of today is not close to that should not take it off the table for the long-term as something BetterGEDCOM as an organization should foster, while at the same time acknowledging that we have to build BetterGEDCOM version 1 within the constraints of where the technology is today.
AdrianB38 2011-02-21T12:29:12-08:00
I think it's reasonable that BG should mandate saying what the calendar is. (I've altered "should" to "must" to match).
AdrianB38 2011-02-21T12:39:38-08:00
I'm not sure I quite approve of the removal of the comments in the "Way Forward" section.

The comment about "It would probably seem sensible to define a default calendar and date ordering for each file" was intended to stop people moaning that they didn't want to mark every date in their file with a calendar. Yes it's a solution-comment not a requirement-comment but I felt it useful.

Also removed is the sentence "An alternative method would be to encode each date into the same representation - e.g. number of days since some agreed event." I put that in because it was mentioned as one possibility for "normalising" dates to allow translation. It's not a solution I liked because it's hard to deal with dates like "About January 1866" (as distinct from "About 1 January 1866") so I did reject it but felt it useful to record the rejection.

Comments anyone on whether the so-called clarifications add anything?
AdrianB38 2011-02-21T14:28:16-08:00
Geir - if I understand the outputs from this Wiki, you've altered "A BetterGEDCOM file must define the calendar" to "A BetterGEDCOM file should define the calendar", while leaving the importance as mandatory.

Perhaps I should have put into the template:
"must" goes with "mandatory" and
"should" goes with "very desirable".
The meanings of the words and phrases are intended to match, because "should" has an element of "should but it might not".

Do you agree with "mandatory" as the importance of the calendar?
gthorud 2011-02-21T16:14:02-08:00
Hm, my memory may not be the best, but I can't remember to have changed it.

But since you mention it, what is the intended use of Importance?

Also, I think that Way forward could include technical solution things - one may need to discuss that. The important thing is to describe what we want so people can understand it.

I understand the difference between should and shall. ISO editor guidelines has taken should out of the vocabulary. But it is difficult to focus on this all the time.

I agree with mandatory, and that the header should give the default.

I am not sure about the implications of "It is desirable that the codified representation of that should differ as little as possible from the written characters in the source, to reduce the scope for error in input or output." - does this mean that we should have text representation and not numeric - maybe not.

You are right about the Calender being dependent on country.

I think we need to discuss before we edit other people's text.
brianjd 2011-02-21T21:14:45-08:00
Adrian,

Sorry about removing that sentence, I was just trying to keep it in sync with your other comments. But as you say it's a solution comment. I thought it was there because you wanted to explain the Desirable status. Since it's naturally a software implementation issue which won't be dictated by by us. One thing I would like to suggest is that a calendar can also be "text only". That being the user entered some funky date that the program can't decipher. Setting Gregorian as the default calendar is probably going to correct 66% of the time.

Plus I thought this page was for the group. I also made a few minor edits fixing grammar in places. I am probably remember wrong, but I thought I saw a posting asking for help with this page. I keep pretty busy and get a ton of communications, so I sometimes mix up messages.
AdrianB38 2011-02-22T07:33:25-08:00
Geir - the history seemed to say to me that it was your id that made the change but I have my doubts that this Wiki always tells the truth if 2 people are updating at the same time. I note that you "agree with mandatory" so I will set it to "must" in the description.

Importance is there to help prioritise items.

Re "It is desirable that the codified representation of that should differ as little as possible from the written" - my thought there is that if the user enters "3 April 2001" (say) and then looks at the GEDCOM, it seems more useful if they recognise the value as something like they entered. It's just something to think about when evaluating how the date should be stored on the file. Personally, I don't like the idea of the numeric date partly because it gives no clue about the calendar in use - e.g. is 2 = February or Brumaire? Though if we explicitly mark each date with a calendar code, maybe it doesn't matter.
AdrianB38 2011-02-22T07:47:55-08:00
Brian - please don't feel you shouldn't update the page. In other places we have gone through loads of discussions but no-one ever updated the main page - hence when I was trawling for the first cut of requirements I nearly lost the will to live after a while!

I've put back a bit about possibly having a default calendar (modified to take account of your comment about it possibly being an application issue)

Plus I've highlighted the assumption that we are going to work with calendar dates and not days elapsed since something.
GeneJ 2011-02-21T09:23:45-08:00
Question re: Id: Data03
We write, "BetterGEDCOM must allow the recording of approximately known values in all appropriate contexts." And also, "Why? ... Note - this is not the same as assigning a probability to a value - e.g.'Probably 1812' is not the same as 'About 1812.' And finally, "Source: Tom Wetmore's Goal and Requirements plus various discussion pages."

Although the thread was turned into a greatly different discussion, can someone clarify if this requirement is intended to address the original point of the thread, "Question re Relationship and Fact/Event/Date/Location/Tag qualifiers/coments (from non-tech)." See:

http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138
brianjd 2011-02-21T10:25:09-08:00
Probably, possibly, likely, maybe, unlikely, et al are all conclusions. These values are mostly relevant to an evidence-conclusion model.

Contrast "between 12/1899 and 11/1900"
with "probably 1900" and "about 1900".

The first one is a statement based on an age listed at the time of death on a record.

The second is a conclusion based on that fact.

The third, is either a cop out on the part of the data entry person, a shorthand way of saying "between date x and y", or a conclusion and is hence a more neutral term. Plus "about" is the de facto standard way of saying maybe and possibly.

In the end "probably", et al is a conclusion, if not taken from a fact (ex: a gravestone giving date range 1846-1900).
GeneJ 2011-02-21T10:50:52-08:00
We're mixing apples and oranges.

The thread I started (URL above) had nothing to do with the evidence-conclusion model that Tom and Louis came to discuss.

Rather, the thread proposed BetterGEDCOM advance the more absolute style of current GEDCOM records into something far more relevant.

Solid conclusions, such as those resulting from a well reasoned proof argument, are often crippled when users are confined to the more absolute style of GEDCOM.

I'm guessing more dialog is necessary to get us all on the same page with this topic in the lead up to finalizing the writing of this "requirement."
brianjd 2011-02-21T12:04:32-08:00
Gene,

I re-read your original post, again. I guess I spaced it out a bit. First off, I use Gramps as my DB of choice. It allows me to put in "about","before","after" and give date ranges for events.

I see no reason why BG should not support that. It's only logical, and thus we should make that clear whether BG will support that too.

However, you are mixing metaphors so to speak.
There's a big difference between
"born between 12/1/1899 and 3/31/1900"
and "born between 12/1/1899 and 3/31/1900, probably in Manchester, England".

The first is merely recording a date range based on evidence, and the second adds in a conclusion with no supporting evidence except perhaps that the alleged parents lived there in that time range.

I don't see a problem with adding in qualifiers like this for dates, names, places, etc. But I think we need to be careful and specific in applying that.

Solid conclusions are still conclusions. If we have apples and oranges, it's because you asked about apples and oranges, hoping to get the oranges renamed to apples. You mixed questions about using facts to limit possibilities (born between, died between), with using known facts to draw conclusions (born probably at).

One of your examples could have gone either way. A child being baptized by the parents may or may not indicate legitimacy or marriage, or may incorrectly indicate one state over the other.
AdrianB38 2011-02-21T12:17:21-08:00
I wrote the original text Gene, and for once I do remember what I meant and no, the specific requirement is NOT intended to cover concepts like "Probably 1812", i.e. it is not intended to address the original point of that thread.

No reason there shouldn't be another requirement about that but it should be a separate requirement because "About 1812" and "Probably 1812" are different concepts - you can see it better if you tweak the phrases to be "I know it was about 1812" and "Probably 1 September 1812 but I can't be certain"

One's fuzzy on the date, the other's fuzzy on the certainty.
AdrianB38 2011-02-21T12:46:48-08:00
Source01
Minor comment - while "tertiary" is logical, I do not think it's used (in general).

I've only ever seen Primary and Secondary defined - even there we clearly have differences because some well respected bloggers(?) have said something like "If a mother tells us about her children 30y after their birth, that's primary because she was there."

As far as I know, a Primary source has to be contemporary so 30y after is Secondary.
AdrianB38 2011-02-23T12:32:26-08:00
Yes, it would be useful if we could agree these terms but I really don't see it as BG's job to do that. I'll let someone else take on the massed ranks of American Family Historians - sorry, Genealogists! (What hope have we got if we can't even agree what to call ourselves! <grin>)

Tertiary is logical but we also then get into debate of whether a photo of a photo of an original is tertiary, secondary or primary. And of course, the answer may not be the same for a .JPG

And _personally_ I'm still sticking with my view that the UK thinks that Primary has to be contemporary. But, as you say, these are just short-hand for a 1st cut stab at how reliable the source is.
Andy_Hatchett 2011-02-23T13:05:49-08:00
In my mind, and I'm not alone in this, there is a very distinct difference between being a Genealogist and being a Family Historian.
AdrianB38 2011-02-23T15:07:09-08:00
Andy: There is a very distinct difference in my mind also, but I suspect it may be the exact opposite of the one in your mind! More of the UK and the USA being divided by our common language. Like I say, let's not go there....

I think all of us here are interested in careful research into all aspects of our relatives' lives, no matter what we call it!
GeneJ 2011-04-04T06:59:41-07:00
Perhaps this should be titled, "Source Form, Information Class and Evidence Type."

SOURCE FORM: Original or Derivative (with derivatives able to be further categorized as "abstracts, compilations, databases, extracts, transcripts, translations, and authored works such as genealogies and histories"). [Mills, _Research process Map_, 2006, and _Evidence Explained_, 2007, p. 24-25.]

INFORMATION CLASS: Primary information or Secondary information. [Mills, _Research process Map_, 2006, and _Evidence Explained_, 2007, p. 24-25.]

EVIDENCE TYPE: Direct, Indirect or Negative. [Mills, _Research process Map_, 2006, and _Evidence Explained_, 2007, p. 24-25.]
GeneJ 2011-04-12T14:03:43-07:00
Is this a way forward?

I happened upon this APG-L message again, so I thought I'd post it.

http://archiver.rootsweb.ancestry.com/th/read/apg/2004-08/1092707376

Mills, "Citation and facts," 16 Aug 2004, APG-L, in part, as below:

If only we could just get those genealogy programs to do away with that ephemeral, ambiguous "surety" concept and replace it with check-boxes for the concrete descriptions used by modern evidence analysis -- i.e.,
Source: Original or derivative
Information: Primary or secondary
Evidence: Direct or indirect
mmartineau 2011-04-12T17:38:51-07:00
What about using the following to weigh evidence:

Ask the following questions:

Did the information about the conclusion come from
- 1st hand knowledge/personal witness of an individual?
- 2nd hand knowledge of an individual?
- Another source document/record?

How does the information support the conclusion?
- Direct evidence?
- Indirect evidence?

Was the information
- Recorded at or near the time of the event?
- Recorded sometime after the event?

Is the information
- Clear and easy to read?
- Difficult to read?

Did the person/organization that recorded the information have any reason to alter the information?

Did the person/organization the provided the information have any reason to alter the information to hide or change what really happened?
GeneJ 2011-04-12T20:35:30-07:00
Hiya:

Reference?
testuser42 2011-04-13T03:57:42-07:00
Gene, you said
If only we could just get those genealogy programs to do away with that ephemeral, ambiguous "surety" concept and replace it with check-boxes for the concrete descriptions used by modern evidence analysis -- i.e.,
Source: Original or derivative
Information: Primary or secondary
Evidence: Direct or indirect


Well, we can and should make these flags possible in BG. So then new software would be enticed to offer these checkboxes.

Mike's questions are more detailed, but the "flags" are similar.
I believe some of these flags would be best attached to "Source" records, others to "Evidence" records (or Extractions in Mike's model), and others to "Conclusions".

Can checkboxes or flags do everything that's necessary for determining the quality of Sources and Conclusions? Or do we need more flexibilty?
Would we still need a generic "surety"?
GeneJ 2011-04-13T09:14:27-07:00
(1) Surety. See the requirement for QUAY and the related discussion. As I recall, there is a particular relevant linked discussion bout QUAY. Quay is relevant to many users at the application level-to eliminate QUAY would remove a feature meaningful to users. Our focus turned to clarifying the standard and re-assert the specific scale (0-3)
If we need to, it's better to continue a discussion about QUAY under the Requirement Catalog entry for QUAY.

(2) Mike's questions. There are probably two issues here. First, should we write specific and detailed questions such as in Mike's fine example, into a standard? Said another way, maybe vendor A sees a need for one set of helpful questions, but vendor B would like those questions phrased or delineated differently. Secondly, genealogy is a living discipline. Standards change and new standards are introduced. Some family historians compile data over decades. Not all software updates at the same time. Consider those challenges in the face of GEDCOM 5.5--it wrote helpful guidance into QUAY and the guidance became anything but helpful. As one example, 5.5 QUAY directed users to consider "preponderance" of the evidence"--but BCG subsequently raised the standard and removed that specific guidance from the GPS. Does that example make my point?

Separately ...I didn't think this Source 01 was ready to move to the Developers' Meeting agenda, because I think we need more curent documentation.

In particular, Tom Jones Inferential Genealogy (linked on the wiki, but you can Google for it, too) he writes, "Consider both indirect and direct evidence, including negative and circumstantial evidence." I didn't want to nuance that passage into a standard without reasoning it through a little more.

Summary: Mills concept of very clear check boxes made a lot of sense to me as a "way forward."
gthorud 2011-04-13T10:50:47-07:00
I personaly do not realy understand what some of these classifications is used for. Are they output in reports? In most cases I would know these classifications from knowing the source. But, leave that aside, as it seems that there is some interest in having classification schemes.

If we are going to have this in the standard, we should try to arrive at agreed values - leaving this to implementors or allowing several shemes will create chaos.

There have been some proposals above to break things down into several consepts that classify a single simple (atomic) aspect, I support that approach. The simple facts are less likely to change over time. How vendors map (combine) theese in the user interface, if at all, that is their problem.

We should NOT create schemes (ala QUAY) where an increasing value 0-1-2-3 indicates a better quality. We should allow a extensibility by allowing eg. a code value 5 to mean e.g. something betwen 1 and 2. The user should relate to a definition, not the code value.

But, we may need to keep QUAY for backwards compatibility - if it's values can not be maped to a more detailed multi value scheme - but then deprecate it's use when recording new data.
GeneJ 2011-04-13T11:23:17-07:00
Please see Source02 and the discussion for QUAY.

It would be helpful if the comments about QUAY were held in the discussion that has already been developed for that requirement, Source02.

Separately, I was logged out again, so my response to the balance of TestUsers questions has been lost. Too much to do to replace.

I'll wait to see how the responses develop.
gthorud 2011-04-13T12:23:26-07:00
Seems like there is a flaw in the implementation of the wiki, it should offer you an option to log in when you try to post.
brianjd 2011-02-21T21:55:11-08:00
Adrian, that was my addition also.

I can't say I've ever run across tertiary in use in genealogy. I was strictly using natural language grammar (or at least European grammar).
First person, second person, third person. I have seen Primary , secondary and tertiary sources used, but can't say in genealogy or not.

Here's my glossary for it.

Primary:: any direct witness or document to an event/object (yes that must allow for the inclusion of the mother 30 years later). I trust if I were to get shot at 3:54PM, and lived another thirty years, I bet I'd be able to recall accurately that I was shot on 22 Feb 2011 at 3:54 in the afternoon. I can't imagine a mother giving birth is any less memorable. ;')

Does this also mean that my data entry in my genealogy file, will become secondary if I live another thirty years? Or does the act of recording it negate that. I speak strictly of the things I have direct knowledge of. My birth, which is more than thirty years past now, as well as my marriages, divorce, child and her stats, etc.
My first marriage is also running up on that magical 30. Now I wouldn't trust any dates my father gave me, but he can't even remember me now. How do you judge what is and isn't believable. My grandmother had my mother convinced she was born in Paris, which is more than a pebble's throw from Hericourt, Alsace. But there was ample evidence to suspect the verity of anything my grandmother said. Even when she wasn't speaking Alsatian.

Secondary: any record or source once removed from primary. So a baptism record indicating legitimate birth could be secondary for a birthdate and marriage of the parents. As could any transcription of an original record.

Tertiary: any record or source further removed than secondary. Such as an article that used a secondary source (a copy of a copy). An article that quotes a book that quotes another that claims it used an original. A cousin's recollection of what he was told by his aunt about his cousin's birth.

Why there seems to be a lack of this is a mystery to me. It probably doesn't matter much but might be useful. I would certainly use it. I can't imagine, I'm that unique. But then I do get a lot of "you're weird man". ;')
GeneJ 2011-02-21T13:01:54-08:00
Data04 - Levels of Confidence in Database Conclusions
Id: Data04
Title: Levels of Confidence in Database Conclusions

Description: BetterGEDCOM should allow the recording of recognized levels of confidence associated with database conclusions

Importance: Very Desirable

Why?: Supports faithful recording of research status and results.

Source: _Evidence Explained_, 2007, p. 19, "certainly," "probably," "possibly," "likely," and "apparently," "perhaps"

Way forward?:

Dependencies:

Approval status:

Proposer: GeneJ

See also the first part of the discussion at:
http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138
brianjd 2011-02-21T22:12:15-08:00
I'd add "speculative" to that list. I have a small number of records with notes attached simply saying that, along with "todo" tags.
AdrianB38 2011-02-22T09:08:57-08:00
I'd support this though I worry about 2 aspects:
- firstly whether we would end up with "employed as a signaller (certainly) in 1881 (certainly) in Crewe (probably) by the North Staffordshire Railway (possibly)"
Or would it make sense to keep one level of confidence at the overall conclusion level? (e.g. "possibly employed as a signaller in 1881 in Crewe by the North Staffordshire Railway" with notes explaining that "his occupation in April 1881 was signaller and he lived in Crewe - since he'd previously been employed by the NSR, he might still be working for them - in any case his place of work was likely to be close to Crewe")

- I am concerned that some of the words could be interpreted differently. "probably" and "likely" mean the same to me (probability over 50%). "possibly" and "perhaps" mean the same to me (probability less than 50%). Not sure what "apparently" means... I do like being "precisely imprecise" but unless we put limits on, we might end up being "imprecisely imprecise"
To me there are only 4 values:
- 95% to 100% (e.g. certainly)
- from 50% to 95% (e.g. probably)
- from 5% to 50% (e.g. possibly)
- from 0% to 5% (e.g. unlikely)
Brian's "speculative" feels to me useful but different - it implies "unknown probability".
GeneJ 2011-02-22T10:45:31-08:00
Hi Adrian,

In your second concern, you wrote, "I am concerned that some of the words could be interpreted differently."

I agree we won't don't all interpret or use the terms in the same way.

Use of these levels of confidence indicators begs the proof (by footnote, proof argument or parenthetical comment).

As a user, I'm particularly interest in seeing terms, rather than assigning a numeric value to terms.* Reason: (a) The concept of levels of confidence in conclusions may be new to the GEDCOM world, but it's not new to the genealogical world. We find these kinds of confidence indicators as terminology in evidence genealogists access day to day. (b) In use (function), these are powerful words--there IS a material difference between "probably" and "possibly." Regardless of differences in interpretation, to me comparing "48" to "51" is far less powerful than the difference in saying "probably" vs "possibly."

In your first concern, you wrote, "... signaller (certainly) in 1881 (certainly) in Crewe (probably) by the North Staffordshire Railway (possibly)..." vs "one level of confidence at the overall conclusion level? (e.g. 'possibly employed as a signaller in 1881 in Crewe by the North Staffordshire Railway' ...)"

To me, you are getting at here is the manner and how well one uses available tools to craft and present a conclusion.

Outside of the GEDCOM world, lacking parenthetical comment, I'd look to where the reference note was placed, and assume that referenced note addressed the preceding statement/conclusion. That same flexibility isn't so available in todays software-GEDCOM context.

At this stage, is it possible to agree the use of levels of confidence qualifiers begs parenthetical comment, note (footnote/endnote) or proof argument, leaving the "how we make that related function happen," as a dependency?

-GJ











*How this works, e.g., the terms, the translation of terms, opportunity for parenthetical comment, and whether the field should be limited to specific terms, all to be vetted still.
brianjd 2011-02-22T13:12:12-08:00
Adrian,

Yes, that is how I apply speculative. Anything that I think could be true, but I can't as yet provide any facts to support. Like two men with the same unusual last name and children who share a range of common names, speculatively being brothers/cousins.
AdrianB38 2011-02-22T13:46:01-08:00
Gene - just to be clear - I wouldn't ever advocate putting the numbers in. I only wrote them there to provide a quantitative definition of what I thought the words meant. And yes, there is definitely a difference between "probably" and "possibly" that is a lot more meaningful than the difference between 49 and 51!

Yes, I think we have to leave for the moment the idea of exactly which, where and how we qualify. So long as we realise when we get there that because we are coding stuff up, the more sophisticated it gets, the more complex it gets - usually.
GeneJ 2011-02-22T13:47:03-08:00
Yes. Perfect.
GeneJ 2011-02-21T13:12:44-08:00
Data05 - Universal Qualifier Symbol ("?")
Id: Data05
Title: Universal Qualifier Symbol ("?")

Description: BetterGEDCOM should incorporate methods allowing users to apply the universal qualifier "?" before dates (or parts of dates), locations, names, etc.

Importance: Very Desirable

Why?: Supports faithful recording of research status and results.

Source: Hoff and Leclerc, _Genealogical Writing in the 21st Century_ (2006), p. 115, "Commonly used Symbols," for "?" as, "uncertain interpretation of original text."

Way forward?:

Dependencies:

Approval status:

Proposer: GeneJ

See also the first part of the discussion at:
http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138
GeneJ 2011-07-08T18:07:03-07:00
(1) References from the original discussion.

Hoff and Leclerc, _Genealogical Writing in the 21st Century_ (2006), p. 2, "The process of expressing our findings in writing--including proper use of terms such as probably, possibly, likely, and maybe--is the most valuable tool in our research kits. Unfortunately, it is also the most neglected."

Same source, p. 115, "Commonly used Symbols," for "?" as, "uncertain interpretation of original text."

The discussion morphed into other topics, but the link is below.

ref: http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138

(2) I have seen the qualifier before information and after, but context is important here.

(a) Before ...
(i) _New England Historical and Genealogical Register_ 165 (Jan 2011): 41, for child list entry about "vii. ?DAUGHTER, bp. Copford 8 Feb. 169/20"; same treatment, also in the child list at p. 48 for a ?SON and a ?DAUGHTER.
(ii) Also before in NEHGR July 2010, p. 187 for the child list entries about ?Ebenezer Handy and ?Hannah Handy. ... "ii. ?HANNAH HANDY. As mentioned above, it is not clear whether Hannah Handy, one of the grantors in the deed of 1 July 1802 given above, was the sister of Silvanus Handy or his mother (or stepmother)."
(iii) Will have to track the third down, was the question symbol in front of the child's listed number in the child list.

(b) ...and also following information, but when it follows, appears in editorial brackets (the brackets also protect the symbol from being confused with punctuation).
(i) _New England Historic and Genealogical Register_ 165 (Jan 2011): 30, for "...and another daughter Phebe was born to them on 4 February 1732[3?] ..."
(ii) Quite frequently in NEHGR to indicate that whether a date should have been a double-date is unknown. For example, NEHGR 163 (April 2009):107, twice, "1680[/1?]" and "1740[/1?]" ... NEHGR 164 (October 2010): 283, 290 for "9 March 1677 [1676/7?]," and "15 Feb. 1742[/3?]"
(iii) For part of a date ...NEHGR 163 (April 2009):139, "400 acres in Killingly on 5[?] January 1718/9" ...

Note: I think an unbracketed question mark symbol that follows an entry would be too easy to confuse with punctuation.

(3) NGS has transcription standards. I haven't looked at those in a little while. Will have to first check BCG Genealogical Standards Manual.

(4) Again not directly related to this requirement ... NGS Quarterly uses dashes to signify a missing name, and NEHGR uses underscores. If I'm not mistaken, four of each (dashes or underscores). NEGHR uses the same underscores for "missing information" [Hoff and Leclerc (2006), p. 116 (Appendix B)]

(5) Similar but different, see requirement "ConfAcc02 (was Data04)":

April 2011 NEHGR, page 125 has entry in the child list for "ii. _probably_ JOHN PYLE, infant, bur ..." and the same term, same treatment for four children in the child list on page 130; again on p. 133.

In the earlier October 2010 issue (NEHGR), p. 264 has entry "(perhaps)" in the child list, as "iv. (perhaps) DAVID WATERBURY ..." same style but "(probably)" on the next page for an "unidentified daughter."
AdrianB38 2011-07-09T10:05:42-07:00
Oh Gene - is that sound I hear, the sound of a can of worms being opened?

I'm not sure how many different combinations of characters and words we seem to be creating that describe uncertainty. And if the learned societies can't agree or be consistent (or am I being pessimistic?) I'm not sure what we do...

But I'm not sure how much we need to do.

If someone enters ?Ebenezer Handy into their genealogy pgm, then I _assume_ it enters the database as
first name ?Ebenezer
surname Handy
This will then get processed as if ?Ebenezer were a normal name - BUT it will not get sorted next to Ebenezer. If you want that to happen, then you either swap to use Ebenezer? Handy or you store it as
uncertainty-prefix ?
first name Ebenezer
surname Handy

The issue of dubious dates is more complex since a lot of software expects dates to look like, err, dates. Certainly in the s/ware I use, if I write "15 Feb. 1742[/3?]", then it won't get processed as a date but as a piece of text with no date meaning. If you want it to appear as a date according to some arbitrary decision, then there are 2 options:
- enter it as string "15 Feb. 1742[/3?]" but tell the database to store an interpreted date of 15 Feb. 1742. Or 15 Feb. 1743 if you wanted.
- create some new uncertainty rating of "Unsure if OS or NS date"

Personally, I'd go for the minimum change and use the facility found in some s/ware of a text string (i.e. "15 Feb. 1742[/3?]") having a date interpretation added to it.

A linked issue is that of transcription markings. I always use square brackets [] round unclear text and angle brackets <> round inserted text. But I've no recollection where I got that practice from. Worse, we have people using [] for both purposes, so I've no real idea whether when they write "[sig] John Doe", do they mean John Doe signed this? Or do they mean the three letters "sig" probably appear in front of the words "John Doe"?

HOWEVER - I think we just ignore this debate in BG and pass forward whatever we get. I don't _think_ we need worry about these transcription standards. Do we?
GeneJ 2011-07-10T18:17:29-07:00
Hi there Adrian:

(1) My personal preference is to allow either the symbol (?) preceding the entry or the symbol in editorial brackets that sets the entry off with specificity [/12?]

(2) As in the original thread ... I was hoping genealogical technology standards could bring us the form of a preceding symbol that would be neutral for sorting purposes.

Generally, the requirement here (universal qualifier) if part of a the broader discussion in the original thread, all of which recognizes the need to move away from absolutes and into the real world of genealogy.

Original thread suggests we all want this, so perhaps we keep collecting ideas and input for now. --GJ
redmanvan 2012-11-05T12:54:10-08:00
I wonder if this is more about the software than about the data. A program that displays the information to the end user can put a ? at the start or the end of a piece of data. It could also display the information in a different font, a different colour or whatever.
Likewise for data entry - a program could use whatever interface it wished to permit a user to specify that information is uncertain.
But the way that information is stored in a file could be quite different. While a ? might suit the end user in many cases, that isn't really a matter for BG to specify.
GeneJ 2012-11-05T13:28:45-08:00
Hi Redmanvan,

The question might be, do genealogists need a standard way of communicating "I'm not quite sure, but I think ..."

If you can't make out a record, you might enter an underscore, "_", but what if you think you can read the name.

This morning I quoted from a text in which the brides name had been published in 1894 as "Mary Witherton [?]"; so the practice of not being able to decipher a name but not being quite certain.

What do you think?
GeneJ 2012-11-05T13:30:15-08:00
*"so the practice of being not quite certain, but practically certain, is not so new.
redmanvan 2012-11-06T08:31:00-08:00
I agree that a standard way of communicating uncertainty is useful - whether a simple question mark is sufficient for every case is arguable.

But my point is, the meaning ("This name is uncertain") can be expressed in so many ways, that we should not force BG to use a ? mark in its internal representation of that meaning.

In fact, in the example you quote, it may convey uncertainty, but it certainly doesn't convey indecipherability.

If BG is to convey uncertainty, it should also convey the reason for that uncertainty, and if there is more than one possible reason (indecipherabilty, inconsistency between records, unlikely spelling, etc.) than a ? is not enough.

Alyn
GeneJ 2012-11-06T09:39:53-08:00
HI Alyn,

Aside from the need to provide the rationale, is there a reason you would not want to provide some symbolism? (Whether it was a question mark or a smily face.)

I agree that there are various ways to convey uncertainty and that indecipherability is only one reason for uncertainty.

"If BG is to convey ... it should also convey the reason ..."*

I didn't mean to suggest that the whole would not explain the reasoning, but there should be a way to convey what has been separately explained.

In my case, that explanation would usually be found in a citation, but I still have the other databits to be concerned about. Another example follows.

Today, I confidently write the name of one of my ancestors as Elizabeth (Clark) Preston. For many years, though, the best evidence I had about her maiden name was a historical marriage record containing modifications. Her married surname had been clearly entered to the record. By some different pen and/or hand, that surname had been crossed out and the surname "Clark" had been written below. (It's possible that the changes had been entered and then erased.) The record I'm referring to can be viewed here:
https://familysearch.org/pal:/MM9.3.1/TH-266-11124-173590-21?cc=1520640&wc=7131654

I can easily record a citation to that record in which I can explain the issues; have even spoken to the town clerk who holds the record and have learn more about it.

During the time when that marriage record was the best evidence of her name, I might have entered Clark[?], associated with the citation I described. For my purpose, that entry would have been preferable to leaving the surname blank.

How the uncertain information is handled involves judgement. In other words, what I might feel is worthy of entry, albeit uncertain and cited, another might see as unworthy of entry at all, only commented upon in the citation. This requirement wasn't intended to set a recording threshold, just to provide a way forward when the genealogist feels it has been met.

(I suspect many genealogists already use the symbol ?, so as Adrian suggests above, we pass it on. I'm suggesting there is probably a benefit to recognizing it and encouraging its use when folks hit that threshold. As long as BG is not being stripped off or relegating the associated data to as "irregular," there are probably other ways of promoting the use of the symbolism.)

*Recognizing that BG can only convey that which it has been "fed."
ACProctor 2012-11-12T10:13:40-08:00
This is an area where formalisation may need to be justified. I know a lot of transcription groups have a syntax for representing unknown characters or characters that may be one of a set. www.freebmd.org.uk is probably the one I am most familiar with. Those syntaxes - which are generally some form of "regular expression" - are easy for software to read, and to work from, but many users would have to look in a book to see what it means.

My question is therefore whether this formalisation is for the benefit of software or end-users.

I thought about this when designing STEMMA and decided to defer any decision. At the moment, I am recording the transcription as best I can (i.e. most obvious and/or most likely), and using STEMMA's narrative feature to record why or how it may be uncertain. That actually works well from an end-user point of view but any name matching algorithm would have nothing to work from
louiskessler 2012-11-12T17:38:53-08:00
Tony,

It's probably useful in the "data" field of source references. But I don't see it needed anywhere else.

Louis
GeneJ 2012-11-12T17:42:42-08:00
Hi Louis,

The purpose of this request was to have better support in the pfact data fields. Those fields are used for other than just tree matching. In the programs I use, they are part of the sentence and narrative structure.
ACProctor 2012-11-13T06:26:06-08:00
Narrative should be searchable too. This is why I have tried to push STEMMA's concept of "structured narrative". In principle, its mark-up language could record the "regular expression" syntax in such a way that it does not detract from the visible text being read by the end user (...in the same way that a URL link shows you the link title without the link syntax).
testuser42 2011-02-23T06:18:08-08:00
Agree.
I've never read any Genealogy standards book (They seem to be much more common and important in the US), but I kind of came up with the exact same use for the "?". Though I also used it for showing "uncertainty" anywhere my software doesn't allow a "surety", e.g. with dates. Which in turn makes my software complain about invalid dates...
GeneJ 2011-02-23T07:19:09-08:00
I should finish my morning wake up before trying to respond.

The trick is we want software to be able to recognize that symbol but also include it and ignore it for the purpose of generating lists or setting things in date order. So, 11 ?Jan 1837 would sort just after 10 Jan 1837, and before 12 Jan 1827; John ?Williams would sort before Williams, Johnny, but after Williams, Jan.
testuser42 2011-02-23T08:16:00-08:00
Yes, exactly. My current software is to stupid for that ;)
AdrianB38 2011-02-23T12:14:18-08:00
"11 ?Jan 1837 would sort just after 10 Jan 1837" - that's probably not a problem. At least two ways of doing it that I can see:
- store both "11 ?Jan 1837" and "11 Jan 1837", the first as the real date, the second as the date to be used for sorting, arithmetic, etc
- store "11 ?Jan 1837" in bits (i.e. "11", "?", "Jan", "1837") and recreate the date for sorting and arithmetic from the "11", "Jan" and "1837" bits.
GeneJ 2011-02-23T12:38:25-08:00
That is great news! tyty
theKiwi 2011-02-23T14:58:24-08:00
Adrian wrote:

"11 ?Jan 1837 would sort just after 10 Jan 1837"

to me this is saying

I know it was the 11th
I know it was 1837
It was possibly (or I think it was) January

so I'm not sure why it would be made to sort after 10 Jan 1837.

If there is to be a qualifier like this, it should be attached to the element it's qualifying, which in Adrian's example would be the 11 not the Jan part of it, so

?11 Jan 1837
GeneJ 2011-02-23T15:33:49-08:00
@Kiwi,

See the first presentation of the example (third entry in the discussion), where "Jan" is the element questioned.
gthorud 2011-02-23T17:15:26-08:00
I am not sure that I see any advantage in encoding this as a separate bit along with bits for day, month etc. I think I would like to see the ? in a string encoded date, most likely accompanied by an optional sort date.

But, I would suggest that a "surety" value, a separate bit, meaning ?, could be attached to the whole date, possibly to one of the two dates defining an interval or a single complete date.

I have already suggested to have a surety value attached to the link between levels in a place/location hierarchy, and there should be surety attached to the link from an event to a place name (thus possibly putting a question mark against the whole path from the bottom level place name to the one on the top level).

And it should be possible to attach ?s against all sorts of relations - between persons, names, even media. I would love to see a ? against a line between persons in a chart.

In the same way as ?, I would in some cases want to see a character indicating dissproval.

I should also mention that in national standards here for transcription of church records and census records, double question marks, ??, are used to indicate that the source is difficult to read so instead of January it could say Jan??y. ?? are used because single a ? may appear in the source. I have seen many dates in church records containing a question mark. (If there are two possible interpretations of a word we write both with @ between, !! marks missing data or an obvious error in the data, and %xx% means xx has been crossed out.)
Christine_E 2011-07-08T11:35:12-07:00
Why not put the "?" after the part that you can't read, then the sort will come out close to where you want it without any extra work?

This not only applies to dates which have a small range of possible values, but names and locations that are hard to decipher or were spelled phonetically.

gthorud makes a good point here: "I should also mention that in national standards here for transcription of church records and census records, double question marks, ??, are used to indicate that the source is difficult to read so instead of January it could say Jan??y. ?? are used because single a ? may appear in the source. I have seen many dates in church records containing a question mark. (If there are two possible interpretations of a word we write both with @ between, !! marks missing data or an obvious error in the data, and %xx% means xx has been crossed out.)" I think the "Way Forward?:" involves research to see what else is commonly done when text is unreadable.
GeneJ 2011-02-21T14:07:41-08:00
Evidence02-Proof Argument and/or Process
Id: Evidence02
Title: Proof Argument and/or Process
Description: BetterGEDCOM should support users need to record and share proof arguments supporting and/or supported by the evidence and conclusions therein recorded or shared.
Importance: Very Desirable
Why?: Supports faithful recording of research status and results
Source: http://www.bcgcertification.org/skillbuilders/skbld091.html
Proposer: GeneJ
brianjd 2011-02-21T22:06:10-08:00
I would agree with this. There are clearly researchers who use the evidence-conclusion model. I would even go as far as saying we are all using the evidence conclusion model, and it would be nice to have a place to put my thoughts down on things I can't "prove" 100% with primary sources.
testuser42 2011-02-23T06:26:01-08:00
Agree, too.
IIRC, there have been a few discussions that touched on a possible way HOW to do it. I tried to come up with something in this message.
GeneJ 2011-02-24T05:32:21-08:00
This requirement is less related to the model than the process.

It's sometimes difficult to incorporate a proof argument in genealogical software because the written proof must be related to more than one source; often to varied facts and even facts about more than one person.
ttwetmore 2011-02-24T07:39:01-08:00
"It's sometimes difficult to incorporate a proof argument in genealogical software because the written proof must be related to more than one source; often to varied facts and even facts about more than one person."

This statement captures one of the main reasons I like the model in which every conclusion person is made up of a tree of person records representing either evidence or previously made conclusions. Each leaf person has its source. And each person made by grouping other person records can have a proof statement based on just the person records being joined (ie, being concluded as being the same real individual) at that decision point. Then the proof statements don't have to justify bringing all evidence together at the same time, which would make an impossibly awkward proof statement. Each proof statement only has to refer to a single decision at a time. Sorry that this concept is a bit difficult to plow through.

TW
GeneJ 2011-02-24T07:57:16-08:00
As I understand it, the "model" requirement is Evidence01. This Evidence02 requirement is intended to be not dependent upon implementation of Evidence01.

Perhaps we need an Evidence03 that is dependent on Evidence01, with Evidence03 being the proof argument methodology for those who would implement the model in Evidence01?
AdrianB38 2011-02-24T08:52:15-08:00
Gene - I suggest you don't need Evidence03 - at least at this stage of the game. You have a requirement expressed as Evidence02. As you say, you're actually thinking about the process in Evidence02, rather than the model as such (which is more related to Evidence01). I suggest you stick to that and don't try to second guess how Evidence02 might be implemented.

It may turn out to be that there are (say) 2 ways of actually designing a solution to Evidence02
- one that happens to require a data model that's most or even all of the way towards also satisfying Evidence01
- a 2nd way that's only vaguely or not at all helpful towards Evidence01

It's at that point we take a decision which is best. Right now, we don't need to worry about the "how" for Evidence02 so you can leave it as it is.

Make sense?
GeneJ 2011-02-24T09:12:06-08:00
Hi Adrian ... makes sense to me, yes.
AdrianB38 2011-04-11T09:09:18-07:00
My page on http://bettergedcom.wikispaces.com/Research+Process%2C+Evidence+%26+GPS seems to be useful in that it defines _my_ view of what the steps are and what the data is.
brianjd 2011-02-21T22:27:34-08:00
Task01 - Research Task
Id:
Task01
Title:
Research Task
Description:
BetterGEDCOM should support users need to record and track research and miscellaneous supporting tasks.
Importance:
Very Desirable
Why?:
Supports faithful recording of research status and results, and reduces repetition of labors.
Source:
Gramps, GenTech model
Way forward?:

Dependencies:

Approval status:

Proposer:
BrianJD
Discussion:
brianjd 2011-02-21T22:32:48-08:00
I thought this would be a very useful field. In my idea, a task would start out life as a "todo" task, and when completed it would be a completed task that would document the research done and optionally record the success or failure of the task. Naturally not all tasks are related to direct research or wanted to be kept when completed. So a flag might be nice which when checked is discarded/ignored when completed.
testuser42 2011-02-23T06:07:09-08:00
I second this idea.
GenXML has a task, too.
http://lineascope.com/ uses tasks, too.
AdrianB38 2011-02-23T12:09:22-08:00
Anyone think GenXML's Objective would also be useful?
AdrianB38 2011-02-22T08:13:40-08:00
Data-Date03 Date Phrases
Add requirement for date phrases with optional interpretation as per GEDCOM Standard.
ACProctor 2011-11-30T08:30:55-08:00
Re: "BetterGEDCOM must allow a "date" to be entered as a phrase where the values are not recognizable to a date parser, but which gives a human reader information about when an event occurred. It must allow such a phrase to have an optional date in parseable format that can be used to interpret the phrase"

See my post of today under Data-Date02.

All determinate date values in BG should be numeric-format ISO 8601. Date-entry and date-formatting should not be relevant to the stored BG data.

However, two areas that are relevant include:

  1. Provide a description of the Date that may include additional semantics, e.g. "Christmas 1956". This is separate from the stored date value and could be used as a display token when it appears in Narrative text

e.g.

<Date>
    <Value> 1956-12-25 </Value>
    <Name> Christmas 1956 </Value>
</Date>

  1. Support for an undecipherable date. This may need a transcript of the original material, or a citation. Also, maybe a best guess at the date's value together with an assessment of the validity
WesleyJohnston 2011-11-30T23:53:43-08:00
As in some other posts, I am taking a very long-range view here. Where I see the specification of this requriement with the words "a phrase where the values are not recognizable to a date parser", my thinking is that we need to support the development of better date parsers, so that 10 or 20 years from now, a date parser can read a date that is specified as at the time of some battle and be able to supply standardized dates, either a specific date or a range for that event. And the same with "on a Tuesday in the Spring of 1873", which is a finite set of specific dates.

Certainly we have to live with the reality of the simplistic date parsers of today. But we should be working to move to a future where that is being robustly addressed and supported.
ttwetmore 2011-12-01T05:25:38-08:00
Dates are often not precise in genealogical contexts. Dates can be “about”, “around”, “before,” “after,” “on or before”, “possibly,” “probably,” “computed,” “interpreted,” “estimated,” “between” this “and” that, “from” this “to” that, this “or” that, this “and” that, as text strings (e.g., “in the year his father died,” “around Christmas a few years ago”). In colonial times we have double years to contend with. For older dates we might not know what calendar the date is from. Dates might be in ecclesiastical form or in terms of the reign of a monarch. We find months that are expressed as words, abbreviations, roman numerals, numbers, and goodness gracious, not in English.

In the evidence we use to find our data, dates are written in many ways. When extracting those dates into event or persona records are we willing to loose the original expression of the dates by translating them into a standard form? I think most of us would agree that we must be able to keep the original wording if we choose, even if we try to translate the dates to a standard form.

Why do we need precise, standard dates in genealogical applications? When do we need things accurate to the day? Or week? Or month? When is it critical to know what calendar a date is from?

Most of the time can’t we just think of a date as a tag associated with a person or an event that can be used as a sort key?

How do we use dates in genealogical applications? For putting lists of people in order. For help in searching for persons and events by providing dates for filtering. To show on forms and display on reports. Maybe to compute average life spans. Which of these requirements need the dates to be in a fixed, standard form? Which require the dates to be known to the day? Which require us to know the calendar? If in a long list of persons sorted by some date property, how important is it if two are switched with respect to the unknowable truth?

I have written a short document that explains the approach to dates taken in DeadEnds. The date class in my software has a context free grammar parser that extracts software date objects from any strings that are written in any of the forms as shown in the examples in this document. Here is a link to the document:

http://bartonstreet.com/deadends/DateFormats.pdf

My contention is that computer geeks naturally tend to want precision and strict standards, even in “humanistic” areas that really can’t support that level of precision. I feel the same way about trying to standardize places to a simple latitude and longitude pair.
ACProctor 2011-12-01T09:50:35-08:00
Re: "As in some other posts, I am taking a very long-range view here. Where I see the specification of this requriement with the words "a phrase where the values are not recognizable to a date parser", my thinking is that we need to support the development of better date parsers, so that 10 or 20 years from now, a date parser can read a date that is specified as at the time of some battle and be able to supply standardized dates, either a specific date or a range for that event. And the same with "on a Tuesday in the Spring of 1873", which is a finite set of specific dates"

As I said in Data-Date02, no such beast exists and it is not possible in the general case Wesley.

The transcribed date string could be retained, obviously, but the interpreted date value should be determined by a person rather than some flakey date-parser with no well-defined grammar. [I was a compiler writer for over 10 years]
ACProctor 2011-12-01T10:02:33-08:00
Re: "Dates are often not precise in genealogical contexts....."

You're misunderstanding me Tom. Error margins and relational qualifications can still be handled with ISO dates. Also, retaining the original transcription is not being questioned here.

The issue is how to store machine readable dates when those dates (including the margins) have been determined or estimated.

You've already made a good point about the formats and semantics being indefinitely vague. Hence, no date-parser software is going to do it for you.

We're talking about a storage and exchange format here - something that "computer geeks" know all about. I'm definitely not suggesting we throw away evidence or transcriptions. :-)

I've seen so many misguided projects where someone has decided they can reverse-engineer a print-format date, or a print-format currency, to get back to a machine-readable version. It just cannot be done in the general case!

When dates are in other calendars then there are other, more serious problems. Some do not allow an accurate conversion to a Gregorian one. I've a vague recollection of other international standards that apply to such dates but I can't find the document at the moment - I need a strong coffee and a torch.
WesleyJohnston 2011-12-01T10:46:43-08:00
re ACProctor: "As I said in Data-Date02, no such beast exists and it is not possible in the general case Wesley."

No such beast exists today, and a perfect one will never exist, but that does not mean that we should not be moving toward implementing one that is constantly improving.

We should not restrict our long-term vision to that which the technology of today limits us. I see BetterGEDCOM as an organization that is on-going, not just to create a product and then disband. As technology evolves, the need for new versions and associated supporting mechanisms, such as places and dates, will improve. We should have a long-term vision of what we would like to see happen and then gradually make that vision happen, even if the current state-of-the-art makes it seem that is never going to be possible.

Look at where we are today with optical character recognition -- which is like "magic" compared to where the technology was just 25 years ago. The US Post Office is even using it to sort hand-written mail. It is not perfect, but it can do the vast majority of the job right.

The ability to do things 25 years from now is something that we cannot conceive as possible right now. But we can still set a long-term vision and as technology progresses we can expect that that vision will be more and more realizable.
ACProctor 2011-12-01T11:03:49-08:00
It's not a case of better technology so much as simply not being practical is most cases, and totally impossible in others Wesley.

A simple look at database storage will show that they don't store and manipulate date strings - they store and manipulate compiled versions (well, in well-designed software anyway). If software has determined a date, or a rigorous researcher has deciphered one themselves, then all I'm saying is that the machine-readable version should be stored in one standard format, not one of a limited set of accepted print formats. It makes no sense to keep decoding the same strings over and over again each time the data is loaded. In fact, there would be no guarantee that you'd have the same date if someone else was using a different implementation of the parser.

I have admit to being surprised that you would entrust some flakey date-parser to effectively generate "facts" in your data anyway. It feels like entrusting an OCR to transcribe a birth cert and simply accepting it blindly.

On the subject of OCRs, have you looked at the results of the recent British Library and Bright Solid venture digitising newspapers? - they're really awful with loads of errors.
WesleyJohnston 2011-12-01T14:47:21-08:00
re ACProctor: "entrust some flakey date-parser to effectively generate "facts" in your data"

I guess I see now where we appeared to differ. I am not viewing these tools as replacing my thinking; I am not entrusting them to generate anything in my data without my review of their results -- which could be a single result or a small set of results.

I see these as advisory tools, for the same reasons that I do not expect Google Translate in today's world to generate a perfect translation, so that if it is a language with which I am familiar, I will use the translation as the starting point and modify it where I spot problems -- not that the result of that modification will be perfect either.

The output of a date parser, which I do not think will be flakey if there is a good effort put into it over the next 10 years, is a source just like any other, to be weighed in conjunction with the other sources.

Advisory systems are a technology already in use in industry and medicine, aimed at narrowing down the choices. I really do see a robust -- though not perfect -- date parser as a very realizable application over the next 10 years, if there is sufficient support for it. And just as with all technologies, it will continue to improve as more iterations resolve more problems.
AdrianB38 2011-02-22T08:48:50-08:00
Data-Date02 assumption
Data-Date02 has an assumption in it (which I have now documented in the "Way Forward") that it is sensible to store dates in calendar format(s) on the BG file and not as (e.g.) number of days since an agreed event.

Does anyone want to dispute this assumption?

Personally I wouldn't want to go to a days-since format for several reasons:
1. It is a notoriously difficult calculation to do - a typical error is to assume 1900 (in the current Western calendar) is a leap year - so we create a risk of error in the application programs.

2. The BG file becomes illegible to humans.

3. Values like "From April 1899 to April 1900" become virtually impossible to translate without assuming accuracy that isn't there.
Explanation (warning - algebra approaching)
Suppose 1 April 1899 is X.
Then 30 April 1899 is X+29; 1 April 1900 is X+365; 30 April is X+365+29 = X+394
Strictly, "April 1899" is actually "From X to X+29" and "April 1900" is actually "From X+365 to X+394".
So is "From April 1899 to April 1900" translated as "(From X to X+29) to (From X+365 to X+394)"? You cannot be serious!
The only practical rendering is "From X to X+394"
Yet this means "From 1 April 1899 to 30 April 1900" - which is not what I wrote and implies more accuracy than the original "From April 1899 to April 1900" which implies to me from _sometime_ in April 1899 to _sometime_ in April 1900.

Since we get partial calendar dates, I think we need to retain a calendar style to record them.
ttwetmore 2011-02-22T09:14:19-08:00
[Tirade on]
Another opportunity for me to go off on my "humanistic versus scientific" data tirade. Humanistic data has a high degree of uncertainty about it. That uncertainty is part of the data. It must be loved just as reverently as we love everything else about the humanistic, genealogical data we collect. Trying to hide the uncertainty under the guise of some crazy-looking precise-looking formatting does an incredible disservice to the data and genealogy as a whole. (Places are uncertain entities; names are uncertain entities; we must embrace their uncertainty as well).

Any date format that requires the computation of the number of days for its Better GEDCOM representation is way off base. I would suggest that one might want to read about GEDCOM's date format or my own DeadEnds data format, once more, which is at

http://bartonstreet.com/deadends/DateFormats.pdf

Yes, these formats are Western-culture-centric, and they can be extended with other calendric forms.

If you know that you had a Dutch ancestor born in April 1633, isn't that enough for you? Yeah, it would be great to know the day of the month, but just knowing the month is wonderfully incredible. What advantage do you get from converting that into the number of days since, say January, 1, 1000, in the Julian calendar. You'd have to know the actual calendar of the date (it could have been Julian or Gregorian), you'd have to know the the actual day of the month. Since you don't know these things it's just plain dumb to convert this into a number of days. Just read Adrian's example to see how dumb.

If you do need to sort that date with others you can just assume the day is in the in the middle of the month to get the best statistical sorting out the uncertainty. Isn't this more than enough? I am always amazed how computer geeks want to complexify data to make it seem more scientific. Genealogical dates are not scientific dates; they are sloppy humanistic dates, and we must deal with them exactly as they are.
[Tirade off]
GeneJ 2011-02-22T09:15:29-08:00
Syntax06 - Define one way of doing a thing
Id: Syntax06
Description: BetterGEDCOM should define just one way of doing one thing.
Importance: Very Desirable
Why?: More than one way may cause ambiguity and extra programming for programmers
Source: Original Goal 7
Way forward?: It may be sensible to agree specific exclusions to this requirement, e.g. for in-line notes and separate note records, where the extra programming work is trivial and does not create ambiguity.
Dependencies:
Issue 1: It is not always possible to agree that two things are, in reality, the same thing. For instance, whether or not in-line notes and separate note-records are, in practical terms, the same thing, has been the topic of debate.
Issue 2: If two separate methods in GEDCOM type formats are merged into one, then it will not be possible to round-trip data from a GEDCOM type format to BG and back again coming up with the same data
Christine_E 2011-07-07T19:56:26-07:00
Could you give another example of something that is done more than one way in existing genealogy programs (besides in-line notes vs note-records, which may be a terminology issue)?

Being able to go backwards should not be a requirement. By incorporating all the data features of existing genealogy programs into BetterGEDCOM, by definition, all of the BetterGEDCOM data won't be able to be incorporated back into those existing applications since different programs have different features. However, the BetterGEDCOM data will be able to be imported into future versions of those programs after they become Better-GEDCOM-qualified.
GeneJ 2011-07-13T14:35:10-07:00
Hi Christine.

This Syntax06 was a posted "goal" moved over from "The original [BetterGEDCOM] goals page," and has been referred to elsewhere as "current goal 7." Unfortunately there was a former goal 7 that had nothing to do with Syntax06.

The original goals page is located here:
http://bettergedcom.wikispaces.com/GOALS

I check the History tab to the original goals page. Appears Louis Kessler added this goal Dec 29, 2010 7:39 pm. I'll try to link the history page here, but not sure it will take.
http://bettergedcom.wikispaces.com/page/diff/GOALS/190247292

There was a lengthy discussion about the original goal. See the discussion here ...
"Single way (current goal 7)"
http://bettergedcom.wikispaces.com/message/view/GOALS/32130746

Hope this helps. --GJopefully Louis will comment with some examples for us.
gthorud 2011-02-24T16:48:47-08:00
Data-PersonNames01 - Sorting on multiple given names and surnames
Description:

BetterGEDCOM shall provide a way to identify parts of names (whole words or parts of words) that shall be used for sorting, identifying if the part should sort as a given name or surname, and shall allow several such surname parts and could allow several given name parts. A priority could be assigned the name parts sorting as surnames. All this information related to sorting is a suggestion to the recipient for how name parts should be sorted.

Why?

Many cultures operate with several surnames. It should be possible to sort on those names in indexes etc. The same applies given names (forenames) because a person may be known by any one of those given names. Some words in a name (eg. prefixes) are not used for sorting, and often the beginning of a name is not used for sorting (d’ in d’Hondt), or one “word” may sort as two names eg. both Berg and Olsen in Berg-Olsen. When there are several surnames, some countries consider the last surname to be most "significant" while others considers the first to be the most significant. Identification of these parts have no influence on how a name is printed in reports or charts. The need to sort on several given names could be discussed, also the priority of surnames. Important: For example, a middle name could indicated to be sorted as a given name or surname, but that does not imply that it is classified as a given name or surname in other contexts, and this proposal does not imply anything about any need to classify name parts as middle name, patronymic etc (which there may perhaps not be a need for).
NeilJohnParker 2011-12-08T15:35:11-08:00
I believe that the name entity needs to be contained in multiple fields which would include at least NameType {Legal, Birth, PriorName, MarriedName, NickName, StageName, PenName, Alias ... (all defined unambiguously)}, Salutation, LastName, FirstName, SecondName, ThirdName, ForthName, Postfix, LastNameSortField, FirstNameSortField... IF FirstName, SecondName, ThirdName, ForthName is not acceptable, there must be some standard technique to easily separate them, perhaps using a comma and space or just a space but compund names would be joined by and underscore. These name components should be capable of being reference to a set of domains and these domains should be naturally exchangeable via the BetterGEDCOM.
AdrianB38 2011-02-25T14:37:12-08:00
Syntax09 Define Event vs. Attribute
Initial creation:
Assuming that the BetterGEDCOM project distinguishes events from properties / facts / attributes / characteristics, then BetterGEDCOM must define and publish a clear definition of the difference between the two concepts that does not rely on a list of each. In particular, the definition must be clear enough for competent software suppliers and users to understand whether a new item is an event or a property / fact / attribute / characteristic.

There is no clear definition in the GEDCOM 5.5 specification of the difference between the two, only a list of events and a list of attributes. This means that a software supplier or user does not always know whether to create an event or attribute. As a result, the same concept can appear as both, resulting in difficulty of exchange of information.
ttwetmore 2011-02-26T17:27:53-08:00
Gier says, "I don’t understand why there must be Level One Vital Events, why can’t they all be Level Zero True Events?"

They could be all level zero. But think of it this way. You get birth data from many sources. Think about getting an age on a census record. This is a great example. You subtract the age from the date of the census and you get an estimated birth year for the person. The census might also list the birth place of the person. So from the census record you have an estimated birth year and a possible birth place. BUT, BUT, BUT, you never really got an actual birth event for the the person did you? You didn't find real hard evidence. It's all secondary information extracted from the census event.

So, one of my principles about genealogy is that you create event records from the evidence you find about events, and you create person records for all persons mentioned in the events, and you add to those person records everything you learn about the persons from the event evidence. In my mind it is much better to think about that inferred birth information as something you learned about the person from the event, so it is something that should be kept inside that person. That's my principle again. It's a compromise of course, as all things are.

See, I don't mind having two ways to do something, if it makes sense to me to have those two ways. In the case of the vital event and the multi-role event I definitely see a difference, and definitely feel it's okay to treat them differently. But every vital event could of course be transferred into a level zero event record if there were a rule that said that was the only way events could exist in the file format.

So for me a vital event, a level one PFACT-like thing inside a person, is just the right thing for these secondary, inferred, not quite events, that we often learn about people offhand through records that were really created for an entirely different reason.

Hope this isn't too confusing.

And of course, there is a very practical answer to the question as well. When converting GEDCOM data to Better GEDCOM format, if there were no level 1 vital events, we would probably triple the size of the resulting file in terms of records and maybe size!

Tom W.
AdrianB38 2011-02-27T04:53:11-08:00
Geir - Re "the structure that some people currently use to transfer hair color, caste, eye colour, nationality etc. although some of this info may change over time – something the Trabant can’t transport"
In GEDCOM 5.5 (at least) INDIVIDUAL_ATTRIBUTE_STRUCTURE (which includes physical descriptions, so I hope that's what we're talking about) includes an EVENT_DETAIL and that in turn includes a DATE_VALUE - so the GEDCOM attribute (the Trabant (grin)) should be able to transport dates from 1 person to another.

A person's name, however, does not have a date in GEDCOM 5.5, giving the impression (to me at least) that all names on file for this person, can apply at once. Name (and sex) are not, in GEDCOM 5.5, attributes in the formal GEDCOM sense of the term.

It is pretty clear to me that Name (and Sex) should be included in the attributes, or whatever we end up calling them, in BG, and that all "attributes" in BG should have dates (or date ranges) in BG (a) because they need them and (b) because most of them could have them now.

Now whether people want to use the dates, whether (if they do) they want to stick a NAME-CHANGE event (say) chronologically between 2 NAME attribute values, is entirely up to them. We just have to provide the facility for those who want to do so.
ttwetmore 2011-02-27T06:08:59-08:00
Adrian,

Your latest about regularizing name and sex and every other PFACT type of thing seems to me to be the approach Better GEDCOM should take towards a generalized attribute concept.

I think the vital event, if we decide to keep it, fits in this category also.

Okay then, there is now a NEXT CONCEPT that needs discussion in the attribute vs event context.

Gedcom has the ASSO tag and DeadEnds has the relation structure and I'm sure other formats have other constructs for the same concept. I have even seen the tags FATH and MOTH in some GEDCOM files.

What I am talking about is an "attribute" that is basically a pointer to another record with a label on it to define the type of the relationship. Note that in the flurry to get rid of the GEDCOM FAM record, such relationship pointers, or a concept that allows the recording of the same information, becomes paramount -- without them genealogical databases would hold no relationships.

It boils down to this question -- IN BETTER GEDCOM HOW DO WE WANT TO EXPRESS THE FACT THAT PERSON A IS THE FATHER OF PERSON B?? I'm assuming that the anti-FAM people have their way and we have no FAM record to help us out. Without a FAM record what will our world be like? Please realize something very important -- in today's world the FAM RECORD IN GEDCOM HOLDS 99.99999% of all relationship information between people. With no FAM entity ALL THIS RELATIONSHIP INFO HAS TO MOVE SOMEWHERE ELSE.

If you check out all the models and what people have written you'll find these three answers:

1. Relationship objects -- this is a common answer. Create a new record type (if you love relational databases you'd call it a new table type). A relationship has a type (parent/child, brother/sister, etc) and then pointers to the two persons in the relationship. Lots of similarities to the concept of the multi-role person, but not identical. As a RDBMS table it's three columns, a type and two foreign keys to a person table.

2. Assertion objects -- this is the scary answer -- a la GenTech, where EVERY relationship between EVERY RECORD TYPE must be mediated by a different assertion object. Assertion objects were also created because they have a simple direct implementation as a RDBMS table. In reality an assertion is nothing more than a generalized relationship -- this is required in the GenTech model because EVERY RELATIONSHIP BETWEEN EVERY KIND OF OBJECT MUST BE IMPLEMENTED AS AN ASSERTION (EVEN RELATIONSHIPS BETWEEN ASSERTIONS!!) -- to go off onto the anti-GenTech tirade for a moment, I hope you realize that in the GenTech model, EVERY PHACT (BE SURE YOU UNDERSTAND WHAT THIS MEANS -- EVERY, EVERY PFACT) IS ALSO ITS OWN RECORD AND FOR YOU TO ADD A PFACT TO A PERSON YOU HAVE TO CREATE THAT PFACT AS A SEPARATE RECORD AND THEN CREATE A NEW ASSERTION RECORD TO BIND THAT PFACT TO ITS PERSON -- "Help, help, I've fallen and I can't get up!"

3. Direct references -- person B points to person A with a pointer that implies "You're my daddy", and maybe vice versa.

We're going to have to pick one of these for Better GEDCOM. OpenGen has been sniffing around the relationship approach. My bet is that SoRD is sniffing around the assertion approach. In DeadEnds I've opted for the direct reference approach. My reasoning has been that the relationship and assertion approaches both require the addition of (at least one) new entity type to the model as well as a large increase in the number of records in actual external files. In the direct reference approach no new entity type is required as each person in a relationship simply points to the other (it could even be others).

So what's the relationship between this discussion and the discussion of what is an attribute? It' pretty simple really. These direct references that could be used to establish relationships between persons also LOOK, ACT, SMELL, and so on, like all other things in our "extended" view of what an attribute is.

Okay, where can this discussion go after we resolve this one?

Well, we could discuss whether a source is an attribute of a person record.

And my favorite, how should we treat the evidence persons that provide the grist for the conclusion mill, as components of a conclusion person? That is, ARE THE EVIDENCE PERSONS THAT PROVIDE THE DATA BEHIND A CONCLUSION PERSON ATTRIBUTES OF THAT CONCLUSION PERSON? If you consult the DeadEnds model you will see that each person record can contain an unbounded number of references to other person records. In the DeadEnds model this is my implementation of the evidence and conclusion objects. So it's very easy to simply to view these evidence person references as "just another" type of attribute.

I take a basic, data structure view on the whole thing. A data structure is a tuple of information. Each element of the tuple can be simple or a self-contained data structure of its own, or it can be a pointer outside of the current data structure to another data structure. All our entity types in the Better GEDCOM model and in every other model can be thought of as one of these data structures. It is this view that is the lowest common denominator for all other views.

Tom
louiskessler 2011-02-27T09:20:34-08:00

Excellent analysis about relationships, Tom. That info deserves to be promoted to some place on the site it will not get lost.

One consideration for deciding what to do with this in BetterGEDCOM. We will want developers to adopt BetterGEDCOM. I expect all 500 plus programs out there now use the FAM relationship object. They will NOT change their internal data structures to accomodate BetterGEDCOM, so they will need an easy way to translate from whatever BG has to their internal structure. If that is going to be too much work on their part, they either won't do it, or they'll do it while raising a big stink about it and that won't be pretty because when asked, they'll tell people that BG sucks.
gthorud 2011-02-27T10:00:40-08:00
It appears that that it is difficult to agree what we are trying to find a term for. It appears that Tom is defining an Attribute as something much wider that my Trabant, and what I understand was meant to be defined in the beginning of this topic. Tom's definition is more like an attribute as used in a data model. Back to square one.



I will come back to other issues, but can someone explain why there can be an event subordinate to what I would call an attribute in INDIVIDUAL_ATTRIBUTE_STRUCTURE in Gedcom 5.5(.1)
gthorud 2011-02-27T10:04:22-08:00
Sorry about the last question, I should have read the previous entries.
gthorud 2011-02-27T11:53:51-08:00

Tom,

Regarding Level zero and Vital Level one events.

I understand what you are saying, but I still don’t see why there could not be only level zero events. And I don’t understand why that would triple the size of the file.



Adrian,

I have a problem understanding the purpose of the INDIVIDUAL_ATTRIBUTE_STRUCTURE containing an event. Cant find anything about why in the Gedcom spec, have I missed anything. WHY NOT JUST USE AN EVENT – is this a construct that that has appeared after someone has discovered that hear colour can change over time? Maybe someone can explain why there is an event inside this structure?

Also, I am skeptic to having date attached to a name. Isn’t a reference to an event where the name is used enough, without having the date within the same structure as the name? Where would the data come from if not from a source that could be identified an event?
ttwetmore 2011-02-27T12:25:42-08:00
Geir,

I didn't mean to throw a spanner into your works. There may be a few more vehicles out in the parking lot!!

I am really a very one-dimensional person. I see data models in very simple terms, and those terms are nearly 100% simple computer record structures with a little bit of object-oriented frosting thrown in.

I see every entity as a computer record structure, where a computer record structure is a tuple of name fields. There are only a few kinds of fields, and maybe these fields are the things that Geir is equating to automobiles. Here are some of the possibilities:

1. A field whose value is a small number of discrete values from a special set (sex has m and f; event type tags come from an agreed upon set, and so on; other examples left to the reader).

2. A field with an infinite number of values, but whose values still obey strict syntactic rules -- e.g., names, place, dates.

3. A field whose value is a generic string -- maybe the description of something, a note, a free-format description of a source.

4. A field that has its own internal record structure -- vital events are like this -- a BIRT in an INDI is a "sub-record" of the INDI record -- once you look at the sub-record in its own right, it's just another record structure inside the INDI record structure. Of course, this is the beauty of both the GEDCOM level structure and the XML element structure; you can carry the sub-record structure structure as deeply as you'd like to go. Note that JSON is the same, it's just that JSON, since it comes directly form the need to transport the values of Java objects, is really closely aligned to the idea a record structures (one would expect, once one realizes that the expressive power of GEDCOM, XML, and JSON are all the same, that it would be no surprise to realize that all WE EVER TALK ABOUT IN GENEALOGICAL MODELS is very simple data structuring stuff). It's just clearer in JSON that you really are transporting a computer data structure. And these new fangled things that people are calling "protocol buffers" these days, are really nothing more than these record structures in a binary form that makes them more efficient for moving around the internet through various service APIs, even though they suffer from the fact that they are not human readable.

5. A field that refers to another record -- source pointers are like this -- relationship pointers (if adopted) would be like this -- all this means is that the value of the field is a REFERENCE to ANOTHER RECORD STRUCTURE that is outside of this record structure -- note an interesting point that is rarely mentioned in the genealogical context -- pointers of this type are always assumed to point to the top level of another record structure, that is, to a full record object, what I call a first class citizen, but in fact there are times when you might want to think in terms of a pointer in one record as pointing SOMEWHERE INSIDE another record.

Someone else might break these things up using a different hierarchy, but there really isn't any other major way to see the data structuring world.

From my point of view every one of these five kind of things can be called an attribute because it provides some specific bit of information about the structure that contains it. Should we have five different vehicles for each of these?

In my view a PERSON and a multi-role EVENT are top level record structures, that is they are never found as sub-structures inside other structures. They stand alone; they are their own thing. You can think of them as attributes of the record that points to them even though they are stand along objects. This is what pointers are usually used for in computer programs anyway.

However, VITAL EVENTS, as I've defined them, and as they are defined within GEDCOM 5.5[.1], are always internal sub-record structures and become attributes that way.

Sorry, but my posts seem to slowly turning into "Computer Data Structuers, 101"

Tom W.
AdrianB38 2011-02-27T13:55:41-08:00
Geir - re "INDIVIDUAL_ATTRIBUTE_STRUCTURE containing an event". I think I understand your question but please bear with me if I got it wrong.

In summary - it isn't an event inside the INDIVIDUAL_ATTRIBUTE_STRUCTURE, it's just a load of data that looks exactly like the nearly-corresponding bit of the event does, so rather than define it twice, they used the same "label".

Perhaps if I write it out more completely (I'm quoting from GEDCOM 5.5.1 because the copy action keeps failing when I try to copy bits out of 5.5)

INDIVIDUAL_RECORD is defined in GEDCOM 5.5.1 (page 25 in my copy) as
n @XREF:INDI@ INDI {1:1}
...
+1 <<INDIVIDUAL_EVENT_STRUCTURE>> {0:M}
+1 <<INDIVIDUAL_ATTRIBUTE_STRUCTURE>> {0:M}
...
(where ... as usual means omitted stuff)
So - the Individual record contains 0, 1 or more event structures and 0, 1 or more event structures.

INDIVIDUAL_EVENT_DETAIL is defined as
n <<EVENT_DETAIL>> {1:1}
n AGE <AGE_AT_EVENT> {0:1}
i.e. it's made up of a single EVENT_DETAIL structure, followed by an optional AGE.

The EVENT_DETAIL is defined as (I'm not going to copy the formal GEDCOM out because it turns out I can't copy to the clipboard from 5.5.1 either!)
- an optional TYPE line
- an optional DATE line
- an optional PLACE structure (itself being multiple lines)
- an optional ADDRESS structure (itself being multiple lines)
...

INDIVIDUAL_ATTRIBUTE_STRUCTURE is defined as a whole list of options, virtually all of which follow the same pattern,viz:
n OCCU <OCCUPATION> {1:1}
+1 <<INDIVIDUAL_EVENT_DETAIL>> {1:1}

Yes - the INDIVIDUAL_ATTRIBUTE_STRUCTURE is defined as a line relevant to the attribute plus one INDIVIDUAL_EVENT_DETAIL. The latter is already defined above as
n <<EVENT_DETAIL>> {1:1}
n AGE <AGE_AT_EVENT> {0:1}
and
EVENT_DETAIL is defined as
- an optional TYPE line
- an optional DATE line
- an optional PLACE structure (itself being multiple lines)
- an optional ADDRESS structure (itself being multiple lines)
...

Or in other words, although the standard says INDIVIDUAL_EVENT_DETAIL, it's short-hand for the group of lines consisting of
- an optional TYPE line
- an optional DATE line
- an optional PLACE structure (itself being multiple lines)
- an optional ADDRESS structure (itself being multiple lines)
...
All of which are what you see against an attribute.

So it's not an event "contained in" the attribute, it's a set of lines having exactly the same format as a set of lines that happened to be defined for the event first.

Err - sorry if my short-hand mislead. Does this make it clearer?
ttwetmore 2011-02-28T05:36:41-08:00
Gier says, "Regarding Level zero and Vital Level one events...I understand what you are saying, but I still don’t see why there could not be only level zero events. And I don’t understand why that would triple the size of the file."

Yes, all level one, "vital events" substructures in person and other records can be converted to level zero "event records." They wouldn't triple the size of the file in pure character count, but would probably at least triple the size of the file in terms of number of records. The files would get larger in character counts simply because of all the extra record "boiler plate" and inter-record references that would have to be added.

The vital events as done in GEDCOM now (eg, BIRT, DEAT, MARR, ...) are much more attribute-like than they are event-like. Some might not agree with that. Each vital event describes one vital fact about a person or a family. There are no other role-players. There is obviously an event hidden away behind the vital, but the details of that event were not of concern when the fact was recorded (or if it were, the genealogical application proved inadequate for the task). A vital event is very much like an abstract of a real event, extracting only limited information (generally just date and place) pertinent to the primary role player.

I sense the real concern over dealing with vital events as substructures in other records versus multi-role events in their own right, is that it might seem like we would be sanctioning two ways of doing the same thing. You can see it that way, but I also think that's an incomplete view. Instead of just rejecting the idea of having two ways to do certain things, isn't it better to stop and rationally consider why there are good reasons for having two ways of doing things?

Think about the very practical problem of converting millions of GEDCOM records into Better GEDCOM records. Is there really a compelling reason to convert every BIRT, every DEAT, every RESI, every BURI, every CHR, every MARR substructure in the GEDCOM files into separate single-role event records? There is no technical reason why you can't do it, and if Better GEDCOM does away with the vital concept idea, we would have to do it, but what does it gain?

I've been a software developer for 45 years and a software professional for 40. If there were anybody around who could give advice for or against the idea of whether it's bad to have more than one way to do certain things, it would be me. I find nothing uncomfortable in the notion of having the two kinds of events. My hope is that having the multi-person event record will encourage genealogists to always collect all the info that is available from the evidence. But for the cases where a full-bodied event is not warranted, or there is no real evidence about the event yet, the vital event works well.

Here's another thought. You've just talked to your grandmother to get information about her grandparents. She gave you some birth dates of her grandparents that aren't yet in your database. Are you going to add those birth dates to the records for her grandparents as simply vital events, as would be done in GEDCOM, or are you going to create separate birth records for each of those grandparents. Don't you think it's kind of overkill to create birth events from such distant and secondary data? OK, you can do it.

Also remember that we are discussing getting rid of the family record. If that happens we have to find a new way to indicate relationships between people. Relationships between people are some of the same implied event information that vital events and regular events do. As soon as we face this we will likely encounter a third way that some information can be implied.

My bottom line is that I PREFER a world with both vital event structures and multi-role event records. We CAN'T live in a world with just vital event sub-structures. We COULD live in a world of just multi-role event records. From that it's up to the Better GEDCOM collective wisdom to choose the official path.

Tom W.
gthorud 2011-03-01T13:37:50-08:00
First, Thanks to Adrian !!!! for making me look much closer at the Gedcom definitions of INDIVIDUAL_ATTRIBUTE_STRUCTURE and INDIVIDUAL_EVENT_STRUCTURE.
It appears that my Trabant has no motor and must always be accompanied by a Rolls Royce. Based on how Attributes are presented in the user interface of some programs, I thought I knew what an attribute is, just a type value pair, I didn't check the Gedcom. So after checking the Gedcom for these two structures it appears, as has been stated earlier in the discussion, the I_ATTRIBUTE_S is just an I_EVENT_S plus a value (with at list one minor exception, BIRTH). This make a lot of change in my thinking, and I am sorry for the unnecessary discussion I have caused – and it must have been very difficult for those trying to understand what my Trabant was.

(When reading the BNF in the Gedcom standard I note that it might not be necessary to define event/attribute tags with BNF in BG, it should rather be done in a list.)

So, an Attribute is an Event plus a value, and in the BG context that can have multiple participants (people/groups) in an event, it becomes necessary to say that an attribute applies to only one person.
The other differences (time/date) seem to have been agreed to be no reason for a difference. I think a BG event structure should be allowed to contain at least one value (actually a type value pair). If so, the only difference between an event and an attribute will be that the attribute can apply to only one person. So, if we assume that we will have lists of event types with definitions etc, as part of the standard and as a central list that can be updated, it would be possible to define in that list if an event can apply to one or several persons/group. The list could also define roles and possible types for the value.

Summed up: This way we do not need to have separate structures for attribute, we only need event. I guess this is no surprise to the rest of you.

I then started to look at various programs and it seems to me that at least three of the major ones have no distinction between event and attribute in the user interface, so we might be in good company.

As has been pointed out above, Gedcom 5.5.1 proposes an extension called "Event description" (following the EVEN tag), which does not seem to be the same as an Attribute value (which some programs seems to think). Depending on interpretation of 5.5.1 Event description may or may not be used for a person. To me this description seems to be some sort of a summary of the event, it is an event value thing not a type thing, and it seems not to be an attribute in the same sense as eg. a caste name. Since it is following the EVEN tag, it seems to apply only to user defined events. How would this event description appear in a sentence in a report?

5.5.1 talks, in EVENT_OR_FACT_CLASSIFICATION, about subtypes of ANY event type, using TYPE. The question is if this is used by anyone? Is it needed? If so, the standard and the central list can define subtypes. Separate requirement?

Adrian, the name and sex issues should perhaps be discussed separately? A name will most likely, in my mind, be a more complex thing than just a string.

About Tom's vital events. It is clear that you need to separate one Burial event from another, but that can easily be done in a level zero event. I see no problem with creating an event record even if the info comes from my grandmother. Regarding extra overhead, I am not very concerned about that, I guess a few photos transferred together with the BGfile will in many cases be much bigger than the BGfile. But, this is an encoding issue, and it will probably be affected by a solution to the evidence-conclusion issue, so I suggest that we wait with this. I agree that the BIRT event may be a special case.

Geir
AdrianB38 2011-03-01T14:03:34-08:00
Geir - "5.5.1 talks, in EVENT_OR_FACT_CLASSIFICATION, about subtypes of ANY event type, using TYPE. The question is if this is used by anyone?"

It's certainly the way that user-defined events or attributes are supposed to be written - see discussion about Custom tags on Developer's meeting page.

I already added Requirement Syntax08 "It should be possible for user-defined events, properties, characteristics, etc, of individuals, etc, to inherit features from previously defined events, properties, characteristics, etc." and speculated / suggested "If events etc are given a type and sub-type, then it would be possible for the user to create a user-defined subtype of an application defined type, and thus inherit the processing done for that type.
For instance, an event "Marriage - civil" might have a type of "Marriage" and a subtype of "civil", thus automatically doing all processing created for the event-type of "Marriage" "

So a sub-type solution is already there in a fashion.

And yes, Name will require more complexity in structuring than the usual "attribute".

As for your looking in more detail at the GEDCOM Spec'n and confounding our expectations - yes, I think we're all doing some of that! Old saying: "If you think you understand what's going on here, you obviously don't..."

PS - I shall be sorry to lose your Trabant!
louiskessler 2011-02-25T20:48:16-08:00
... and GEDCOM is just suggesting that if something takes longer than a day, then it is *probably* a fact rather than an event. They do not impose it as a rule.
louiskessler 2011-02-25T21:11:33-08:00



Adrian said Today 2:57 pm:
re: Custom GEDCOM tags

Louis - re your statements "Events also having descriptions, e.g.:
"1 EVEN Appointed Zoning Committee Chairperson
"2 TYPE Civic Appointments"
and
"events can have descriptions (i.e. attributes), the presence or absence of an attribute cannot be used to define the difference between events and facts"

That's, ahem, "interesting". I just double checked GEDCOM 5.5 and the INDIVIDUAL_EVENT_STRUCTURE in that copy seems clear to me that it does not allow a description (i.e. attribute) for an event - not even the EVEN generic event that you quote. TYPE, yes, no problem with that.

Do you know if previous (or post 5.5) versions of GEDCOM relaxed this? Or is it simply a case of software suppliers trampling over the standard again? In a sense, it doesn't really matter either way because if there are files out there with that construction, we need to deal with them. But I'd still like to understand what's going on (my pedantic brain again).



Adrian:

The example:

"1 EVEN Appointed Zoning Committee Chairperson
"2 TYPE Civic Appointments"

was taken right from GEDCOM 5.5.1 page 48 in the definition of EVENT_DESCRIPTOR. It says:

EVENT_DESCRIPTOR:= {Size=1:90}
Text describing a particular event pertaining to the individual or family. This event value is usually
assigned to the EVEN tag. The classification as to the difference between this specific event and other
occurrences of the EVENt tag is indicated by the use of a subordinate TYPE tag selected from the
EVENT_DETAIL structure. For example;
1 EVEN Appointed Zoning Committee Chairperson
2 TYPE Civic Appointments
2 DATE FROM JAN 1952 TO JAN 1956
2 PLAC Cove, Cache, Utah
2 AGNC Cove City Redevelopment

Now go to the FAMILY_EVENT_STRUCTURE on page 32, and you'll see under it is:

n EVEN [<EVENT_DESCRIPTOR> | <NULL>] {1:1} p.48
+1 <<FAMILY_EVENT_DETAIL>> {0:1} p.32

But, if you look at INDIVIDUAL_EVENT_STRUCTURE on page 34-35, you'll see:

n EVEN {1:1}
+1 <<INDIVIDUAL_EVENT_DETAIL>> {0:1}* p.34

which does *not* have the EVENT_DESCRIPTOR. I am sure this is what you looked at.

However, the latter MUST be a mistake in the GEDCOM definition, because their own example they gave of being "Appointed Zoning Committee Chairperson" is not a family event, but it is rather an individual event.

I believe the proper interpretation should be that it was GEDCOM's intention to have the event descriptor on both INDI events and FAM events, because it simply doesn't make sense to have them on only FAM events.

Louis
ttwetmore 2011-02-26T07:10:27-08:00
I hesitate to point this out once again, but it seems that I may be the only one stressing the point.

The discussion above covers the LEVEL ONE GEDCOM events, what I call VITAL EVENTS to separate them from the TRUE EVENTS, which stand alone as evidence and may refer to multiple persons as role players.

Since vital events are level one entities within level zero INDI records they are easily viewed as simply as a more structured kind of PFACT than the simpler PFACTs like name, age, occupation. I can see that there is interest in defining exactly what is the distinction to be made between vital events AS PFACTs and other simpler PFACTs, but I fear this entire discussion simply takes away from the important question of LEVEL ZERO TRUE EVENTS.

I really don't think it matters whether LEVEL ONE VITAL EVENTS are considered just another kind of PFACT or not. I can't think of any major reason. If you decide you can give dates to all PFACTs (e.g., occupation, name, sex [considering the possibility of sex changes -- joke, joke, joke, please]) then the differences seem to be pretty moot. Would there be any real difference in the model, the file formats, the information content, the application software, if this distinction wasn't considered to be important. Couldn't we just say that persons have lots of different kinds of PFACTs, describe their different natures, and simply fit VITAL EVENTS in as one of those sub-types?

What must distinguish Better GEDCOM is the work to extend beyond the GEDCOM model into areas that make the new model compelling for the next generation of genealogical applications. In my view we have identified the two major areas where the model must be extended, to LEVEL ZERO EVENTS as first class citizens, and to full support for persons and events at the EVIDENCE and CONCLUSION/HYPOTHESIS levels.

Anyway, I guess it's interesting to discuss the differences between LEVEL ONE VITAL EVENTS and PFACTs in general, but please don't loose sight of where the true work lines, in the definition of the LEVEL ZERO TRUE EVENTS.

Tom W.
hrworth 2011-02-26T07:44:35-08:00
Tom,

Might I suggest that some of this information, mostly what is in Caps, be Documented on this Wiki. YOU know what all of these means, but I am not sure that others know, specifically, what you mean. That may be why you are answering the same questions frequently.

What is: Level One Vital Events
What is: PFACTs
What is: Level Zero True Events.
What is: Evident Level
What is: Conclusion Level
What is: Hypothesis Level

Seems to me, that these definitions need to be at an Overview of the BetterGEDCOM level on this wiki. I don't know exactly where, but at a high enough level on the wiki, so that we can see what they mean.

I know, I am asking as a simple End User, but don't know how they would show up in a GEDCOM file (today). If I could, then I would be able to find it in the software that I use.

IF, they are not in a current GEDCOM file, thats OK too, but how would I see them in a BetterGEDCOM file?

Thank you,

Russ
AdrianB38 2011-02-26T09:13:30-08:00
Louis
I surrender! (grin) In fact, I surrendered sometime last night when thinking about how I had entered people's inheritances. I did like the idea of an event being a change but couldn't square it with the bit about "values" needing attributes (to use old GEDCOM terms). Coming into an inheritance was, from various viewpoints,
- an event when considering the English definition of the word event;
- an event when viewed as a change of state (in monetary value or possessions);
- an event because it involves at least 2 people (the recipient and the deceased) and therefore needs to be what Tom refers to as a Level zero event (referring to what the GEDCOM _would_ look like if it allowed the concept);
- an attribute because I needed to access the field in my application program to put the description of the inheritance in.

3-1 to events. And then Louis comes up with the text from 5.5.1 that would release that field to an event as well as an attribute. 4-0 to events.

(Louis - thanks for guiding me to the relevant pages. It's all quite different from 5.5 and I'd made the mistake of believing the section on "Modifications in Version 5.5.1", where it mentions nothing of these changes.)

So - where does that leave things?

Pretty much as Tom contends, I think.

We have an English definition of "Event = a change in state" (sorry if that's a bit like a physicist's speak). This implies a definition of Attribute (to use the old GEDCOM term) as an ongoing state-description, but everything gets a bit vague when you consider short term attributes - for instance, you could consider Occupation as an attribute but only be in a job for a day.

Fundamentally, I see no reason to worry about the distinction between an attribute for a person and an event that applies only to one person - there's no serious distinction that I can envisage in the future BG Data Model. Which I think both Louis and Tom said above - but I needed to convince myself.

The most crucial feature is what Tom has stressed above (and sorry, Tom, I was taking it as read so much that I didn't mention it) - the need to accommodate Events that have more than 1 person. How do we identify these? In Data Modelling terms, it's easy - there is a many-many relation between these Events and People (I'm talking people but it applies equally to Families / Groups / Locations etc), whereas between the other sort of events OR attributes there is only a many-to-one relation from them to People etc. But I'm not sure how to explain it in English - or whether we need to if we can communicate with the software suppliers.

I'm half tempted to say that all things known as events should be regarded as the many-to-many type of events. Even the death event, for instance, which might be thought of as an event applying to 1 person only, might include a 2nd person (in the role of a doctor, e.g.).

I'm also now half tempted to revert to a position I held some years ago and say that in BetterGEDCOM events and attributes are the same thing after all, and the only thing we need to worry about is whether they are many-to-many or many-to-one to people etc.
louiskessler 2011-02-26T09:49:33-08:00

Adrian,

I've always used GEDCOM 5.5.1 and I also didn't realize the big change that they here from 5.5

To sum this up so that everyone see what changed, in GEDCOM 5.5 the EVENT_DESCRIPTOR on the TYPE tag, and nothing on the EVEN tag, i.e.:

1 EVEN
2 TYPE <EVENT_DESCRIPTOR>

GEDCOM 5.5.1 changed the actual meaning of the EVENT_DESCRIPTOR and moved it to the EVEN tag, and added an EVENT_OR_FACT_CLASSIFICATION to the TYPE tag, i.e:

1 EVEN <EVENT_DESCRIPTOR>
2 TYPE <EVENT_OR_FACT_CLASSIFICATION>

with the mistake I earlier pointed out of GEDCOM 5.5.1 leaving the EVENT_DESCRIPTOR off of INDI events.

But this leads us to another minor issue, which should be agreed on (maybe in another thread):

Should we mostly refer to GEDCOM 5.5 (the standard) or GEDCOM 5.5.1 (the beta) or enumerate both as the starting point of BetterGEDCOM? As we're pointing out, there were many subtle changes between the two.

Personally, I prefer GEDCOM 5.5.1, because that was the intended direction that GEDCOM was going, and many programs did adopt it. For my program I take 5.5.1 and extend it so that dropped features from previous GEDCOM versions can also be handled.

Louis
gthorud 2011-02-26T11:48:43-08:00
Seems like a consensus is emerging.

The following will be repeating what has already been said, but just to sum up my understanding.

We have two types of vehicle that can transfer info
- One small and simple one that has traditionally been called attributes, which are single predefined values or (user defined) type and value pares. They apply to one record (person, place, group etc) and there may me one or more of the same type.

- One big and complex one that has traditionally been called events, that should be a top level record, that can carry info that refers to zero or more person names/group names with various roles, zero or a few dates or a period, zero or more place names (possibly also identifying an address/contact structure), zero or more values (preferably predefined types), zero or more notes of various types, zero or more citation references (not determined at what levels in the event structure), zero or more multimedia references, and has administrative info (references to research process info, selection flags/marks etc.) The conclusion variant of this vehicle should have several pieces with surety info at various places in the vehicle.

There is no rule for what types of info that can be transported by these vehicles, except that types that may AT SOME (FUTURE) TIME not fit into the small one, must use the big one.

There are no rules about time (except that the small vehicle can not transfer dates– the time/period associated with the types of values in the small one is undefined or implied by the type).

The info types that can be transferred are defined in the standard, or a central registry or can be user defined, where the ONLY vehicle type to be used for an info type is specified, together with the roles, subtypes/classes and value type(s) (and possibly more). User defined types should be described in the file header for both vehicle types.

The problem seems to be what we should call them (perhaps Trabant and Rolls Royce). The names should not imply anything about what type of info they can transport.

The evidence-conclusion issue is orthogonal to this discussion, ie it should be another dimension – I hope.

Many of the lists of info types described above, carried by the big vehicle, is not central to the discussion in this topic.


Re. Gedcom 5.5 or 5.5.1 Agree that there is a problem, but I think, when needed, a reference should specify 5.5 or 5.5.1 – it is difficult to choose. The best thing would be to copy the text so many readers don’t have to do a lookup.
AdrianB38 2011-02-26T12:26:22-08:00
Re GEDCOM 5.5 or 5.5.1 - I wouldn't get hung up on it. We are, after all, trying to define our own data model and standard. If we say "GEDCOM says / doesn't say X", then we should qualify which version of the standard says it. Otherwise, our work should stand alone.

Geir - After getting my head round your vehicle / Trabant / Rolls Royce analogy - which I rather like! - I am slightly worried about what you mean by "There are no rules about time (except that the small vehicle can not transfer dates– the time/period associated with the types of values in the small one is undefined or implied by the type)." Our small 'construction' can transfer genealogical data containing dates. Someone's name is a classic instance of the simple data that fits into that simple 'construction'. For most men in the UK, it will be the same value through their life - an implied time period "GivenName FamilyName From Birth to Death". But there will be lots of women whose name changes during their life, so will be (e.g.)
"GivenName FamilyName From Birth to Marriage1"
"GivenName AnotherFamilyName From Marriage1 to Marriage2" etc
(where Birth, Marriage1 etc represent dates.)
ttwetmore 2011-02-26T13:52:09-08:00
What is: Level One Vital Events -- level one GEDCOM structures commonly called events today (e.g., BIRT, DEAT, BURI, BAPM, ...).
What is: PFACTs -- level one GEDCOM lines or structures that hold properties, facts, attributes, characteristic or traits.
What is: Level Zero True Events -- level zero GEDCOM records (see Event GEDCOM for examples) that hold event info with references to multiple INDI records.
What is: Evidence Level -- Records that hold evidence information.
What is: Conclusion Level -- Records that hold conclusions/hypotheses.
What is: Hypothesis Level -- Records that hold conclusions/hypotheses.
gthorud 2011-02-26T13:58:14-08:00
Adrian,

I should perhaps change the first sentence in my design of a Trabant into “One small and simple one that has traditionally been called attributes, which are predefined standardized, centrally registered or user defined type and value pares.

I am thinking about the structure that some people currently use to transfer hair color, caste, eye colour, nationality etc. although some of this info may change over time – something the Trabant can’t transport.

There are other structures that carry names, and I am not sure if there should be dates in that structure – maybe better handled by events when there is a change.

Tom,

I don’t understand why there must be Level One Vital Events, why can’t they all be Level Zero True Events?
gthorud 2011-02-26T15:04:12-08:00
Why can't we simply keep the words used in Gedcom, Attribute and Event, the Trabant and the Rolls Royce, and define them the way we want by modifying any Gedcom definitions to suite us? Choosing other words will just create confusion. And, I would like to get rid of the term PFACT, it is creating confusion.
ttwetmore 2011-02-26T17:15:03-08:00
If there is one thing I've learned in over 45 years of technical work, no matter what we do there will be confusion over terms, and there will always be new people getting involved who want to start all terminology discussions over again. It's part of the posturing procedure inherent any time human beings come together to get work done.

The term PFACT came about because there were different people on this Better GEDCOM, using all the terms of property, fact, attribute, characteristic and trait, all to mean the same thing, but not realizing that everyone was using their pet terms to mean exactly the same thing. I came up with the term PFACT as an attempt at a preemptive strike at forestalling months of confusion and argument over terminology. It was a a cute way to help avoid problems. It we are all mature enough now that we can retire PFACT and replace it with one of those synonyms (I sense that attribute might be the winner), that would be great. But I guarantee as soon as more people join this effort all those other terms will crop back up and confusion will reign once more.

Tom
AdrianB38 2011-02-25T14:50:16-08:00
This arises from the discussion on the page for Custom GEDCOM Tags. (Oops - just checked - that's a discussion of that name on the page for the Developers Meeting). It is be sensible to give this important topic its own discussion.

Firstly note that the Requirement does NOT propose a definition of the difference - just proposes that there should be one.

Secondly note that it is not sufficient just to say Events are (one list of things) and Properties / Attributes etc are (another list of things) since this does not help when creating a user-defined "thing".

Thirdly, using the English language as our definition (e.g. "Oh, we know what we mean by the word 'event'") is not helpful to those of us with a different language.

Finally, the two concepts are not that far apart - if I were creating an object oriented program to update some family history related objects (not that I could) then I suspect that both event and attribute objects could inherit a lot of "stuff" from a common object. So we don't, I suspect, need to worry about the difference immediately.
AdrianB38 2011-02-25T15:02:16-08:00
Some important clips from previous discussions:

louiskessler Tuesday, 12:40 am
...
There is an EVEN (Event tag) which describes a change that happens at some time,
and a FACT tag which describe something that is true over a time period.
Most other tags are simply descriptions of one of these, and will be the data for the TYPE tag under the EVEN or FACT, e.g.
1 EVEN
2 TYPE Graduation




AdrianB38 Tuesday, 10:59 pm
Louis - re your statement
"There is an EVEN (Event tag) which describes a change that happens at some time,
"and a FACT tag which describe something that is true over a time period."
We've probably had this discussion before (grin!) but while your definition _tends_ very much to be true, we can concoct a definition of the difference between event and attribute (to use the GEDCOM terms) that leads to events happening over a long time.
This is even more true if we go for the concept of an event affecting multiple people while an attribute only applies to one person.
Specifically:
- an attribute must have a value (not one of the existing place, date, etc)
- if something has a value then it's an attribute
- an event must not have a value
- if something doesn't have a value then it's an event
Using this definition, "World War One" qualifies as an event and it clearly lasts for several years. It also affects a number of people, so that's another good reason to take it as an event.
Also, Residence qualifies as an event since the so-called value it usually has is PLACE, which is already present, so it doesn't actually need this "value" item. Some GEDCOM type programs get themselves in knots over Residence because they say "Residence is an Attribute - but unlike every other Attribute it doesn't have a value"
I much prefer my definition of the difference between Event and Attribute because it can be precisely described with no exceptions.
But it's also important to realise that many facts can easily be represented in either fashion depending on whether you bring things like Cause-of-event and Responsible-Agency into play.... So it probably shouldn't cause us too much grief too soon.



louiskessler Today 6:43 am
...
You left out some things that make the analysis even tougher, such as Events also having descriptions, e.g.:
1 EVEN Appointed Zoning Committee Chairperson
2 TYPE Civic Appointments
and that the TYPE event descriptor can be applied to defined events, e.g.:
1 MARR
2 TYPE Common Law
Basically, the TYPE can be any text the user chooses, and GEDCOM states it should be displayed as given.



louiskessler Today 6:53 am
... and since events can have descriptions (i.e. attributes), the presence or absence of an attribute cannot be used to define the difference between events and facts.
The true difference, is that an event denotes a change of something and when that occurs. A fact indicates a truth that exists and the period of time during which it is true. From GEDCOM:
"As a general rule, events are things that happen on a specific date. Use the date form ‘BET date AND date’ to indicate that an event took place at some time between two dates. Resist the temptation to use a ‘FROM date TO date’ form in an event structure. If the subject of your recording occurred over a period of time, then it is probably not an event, but rather an attribute or fact."



ttwetmore Today 10:44 am
The quote Louis provided about dates in events is a good guideline. However, I believe it is still reasonable to allow events that occur over a range of dates, so don't believe the quote should have been so strongly phrased.
Examples of events that take more than one day would be a trip, eg, an ocean voyage when immigrating. Yes, of course, you can add two events, a departure event followed by an arrival event, which might be the recommended course, but why disallow an event for the voyage as a whole.
...
How about a multi-day ceremony? How about a vacation? GEDCOM should be an important guide for BetterGEDCOM, but all of its assumptions are fair game for re-examination.




ttwetmore 20 minutes ago
Just ole opinionated me again. Adrian's last about the difference between an event and a characteristic brings up an important (IMHO) point that I have tried to cover in the DeadEnds model.

What is an event in GEDCOM? It is a substructure of lines inside a person record (or family record) that describes a date and place and maybe some other information about an event that occurred in the life of the person (let's forget families for awhile). These are SINGLE ROLE events that conveniently forget that a birth event really involves at least three persons! Thinking about events in this trivial way has gotten so commonplace that it has nearly completely hidden from view what events really are. A role player is NOT MENTIONED in these events because the substructure is inside the record of the event's PRIMARY ROLE PLAYER already. These "events" serve PRIMITIVE genealogy well, the simple quest for birth and death dates for direct ancestors, but they are inadequate for serious genealogy. GEDCOM came out of the LDS's goal to perform temple rites on church members' ancestors, not from anyone's goal to have GEDCOM support serious genealogy. We should not be surprised that GEDCOM is only suited for the fairly simple requirements of the church faithful.

In the DeadEnds model I call these events "vital events" and keep them as substructures inside person and other records. Thus converting a file from GEDCOM format to DeadEnds format does not cause an explosion in the number of now independent event records with single role players. In some sense these "vital events" are really like structured PFACTs and I think can be treated as such. You just have to think of a birth date and place as structured but still pretty simple PFACTs about a person.

But then there are "real events." GEDCOM doesn't have them as first class citizens, but some systems now do. These events are full-fledged, top-level, first class citizen, records, with their own record identifiers, indexed as any record would be. These are true multi-role entities that refer of to the person records of the person who play the roles in the event. The DeadEnds model brings this record type front and center right along with person records, and I certainly believe strongly that the Better GEDCOM model should do the same.

So, the bottom line. In thinking about events, one must be pretty careful to know what one is talking about. I hope I have explained the differences between the two ways that the term event is used most commonly in the genealogical context today. When deciding how to include events in the Better GEDCOM model it is important to know which of these concepts is the one one is discussing.

Tom W.
AdrianB38 2011-02-25T15:16:29-08:00
Can I start the discussion with one important point?

While a lot of genealogical events do happen on a single day, and a lot of genealogical events do mark a change in something, I think we must allow that a lot of events that affected our relatives took place over a number of days - years even. For instance (assuming we allow only the 2 concepts of event on the one hand and property / characteristic / whatever on the other) then World War 1 is clearly an event. The American War of Independence is clearly an event. Etc. There is no way that these can be described as properties - and I don't think any of us would pretend that they can.

If we are going to go for the concept of multi-person events in BetterGEDCOM, then I think we are going to see an increase in the number of multi-day events, which makes definitions involving single days somewhat deficient.
louiskessler 2011-02-25T20:46:58-08:00
Again, the proper definition of an "Event" is something that is the change between one state and another state. It does not have to be a point in time, and can be a period of time, even years.

A "Fact" is a "truth" that is correct for a period of time.

An event will be at the beginning and the end of a fact. Before the event will be another fact. After the event will be another fact.

Event - Fact - Event - Fact ...

You don't always list them all or care about them all, so only the events or facts of interest are the ones you denote.

e.g. 1:

Fact - John hasn't been born. Before Jan 1, 1950.
Event - John is born. Jan 1, 1950
Fact - John is bald. Jan 1, 1950 to June 30, 1950.
Event - John's hair is growing. July 1, 1950 to Dec 31, 1951.
Fact - John has a full head of black hair. Jan 1, 1952 to Dec 31, 1979.
Event - John's hair is falling out. Jan 1, 1980 to Dec 31, 1999.
Fact - John is bald - From Jan 1, 2000 on.

Now if you want, you could turn each Event into a start event and an end event with a fact in between, e.g.:

Event - John's hair is falling out. Jan 1, 1980 to Dec 31, 1999.

can be:

Event - John's hair starts falling out. Jan 1 1980
Fact - John's hair is falling out. Jan 1, 1980 to Dec 31, 1999.
Event - The last lonely little hair on John's head falls out. Dec 31, 1999

So I don't think it's rigid, and there is even ambiguity as to whether something is an event or a fact. I've in the past thought the two were so similar, that there is no reason to necessarily have two different objects, but maybe just an event/fact. But I don't really care about that.

e.g. 2 (Adrian's other example):

Fact: Period prior to World War I
Event: World War I
Fact: Period after World War I

I see no problem with that. Or even with this:

Fact: Period prior to World War I
Event: Start of World War I
Fact: World War I
Event: End of World War I
Fact: Period after World War I

Why are these both okay? Because an event is a transition. World War I was a transition from the time before the war to the time after the war, so it was an event.

But it was also true that World War I was happening between the start of WWI and the end of WWI. So WWI was also a fact.

I hope noone is getting too hung up on this. Not everything has rigid rules.
gthorud 2011-02-26T17:11:06-08:00
Base document for changes
Some requirements define things that are already defined by Gedcom. Unless someone has requested a change, it should not be necessary to restate requirements that that are already handled by Gedcom. - things we want to keep. Otherwise, we could end up redefining Gedcom, in pieces, without actually achieving any progress.

Could we pick a version of Gedcom as our starting point as far as functionality and the semantics of data are concerned - ie data structures are not automatically adopted. (Difficult to phrase what I mean).

Could we use 5.5 ?
gthorud 2011-02-26T17:22:22-08:00
Louis has already raised a similar problem re. which version do we mean when we refer to Gedcom? The problem is bigger than I realized when he made the suggestion.
AdrianB38 2011-02-27T05:09:51-08:00
Geir - re "Some requirements define things that are already defined by Gedcom. Unless someone has requested a change, it should not be necessary to restate requirements that that are already handled by Gedcom"

I think we will find it sensible to at least highlight the major items - for instance, the list of entity types we want to carry forward, otherwise the Requirements Catalogue looks odd. Then the detail requirements Conversion01 and Data01 both mandate (for different reasons, hence why we have 2) forward compatibility with GEDCOM. So we don't need to list all the detail.

Generally, when we say GEDCOM there should be no reason to specify which. Our requirements should stand alone. Except for Conversion01 and Data01?

Not sure even there that it's necessary - Data01 talks about "BG compatible software must be able to import data from existing applications". Note the plural, not from one specific standards compliant version.

Probably we don't need the precise version for the overall statement but for doing a cross-check, maybe we should say that GEDCOM version X.X is the base and everything beyond that needs, at some point, to be explicitly stated.

I'd be tempted to go for GEDCOM 5.5 since having been caught out by previous discussions about type and events, I have no faith that 5.5.1 is consistent - indeed, Louis has demonstrated that it isn't (unless FS had gone completely daft).
gthorud 2011-02-27T08:09:57-08:00
The requirement that triggered me to start this discussion was
Data-Ind01: BetterGEDCOM must support the recording of genealogy / family history data about individual people.

The problem is that the catalog will grow to be very big, so we don't need statements about obvious things.

You are right about backwards compatibility, that is also what I thought, but we should state this explicitly.

I think we can use 5.5 - that is the official version. It should not be a big job to include the new things, or delete old things, as requirements.
AdrianB38 2011-02-27T07:51:08-08:00
Data-Ind03 Non-biological, non-family relationships
Created
AdrianB38 2011-02-27T08:25:36-08:00
BetterGEDCOM must provide a means to document relationships between individuals that are not based on biology or family, e.g. "X is the friend of Y".

The biological relationship requirement is documented on Data-Ind02. Note this says "Biological relationships can exist where there is no family in any meaningful sense."

The family relationship requirement is documented on Data-Fam01. Note this points out "Family units exist where there is no underlying biological relationship and no legal adoptions."

This requirement is intended to cover all other relationships between individuals unless further ones seem worthy of splitting out. Examples quoted in GEDCOM Standard 5.5 include friend and god-parent.

Questions to be resolved about this relationship requirement include:
- If X is related to Y, is Y related to X, and if so, is the name of the relationship the same?

Part 2 is probably not - If X is an apprentice of Y, then Y is NOT an apprentice of X. In this case, there IS an inverse relationship.

If X is an admirer of Y, then there is no reason to assume there is any inverse relationship.

Conclusion - for this generic relationship, a relation from X to Y does NOT always imply a relation from Y to X, and even if an inverse does exist for a specific relationship type, the name of the inverse is NOT always the same.

Question 2 - does this sort of relationship need to be able to link more than 2 people? Apprentice to Master is certainly a many to one but each relationship can be recorded separately. Although groups of friends can be said to exist, my current instinct is "No" to linking 3 or more. If we do find such a need then the entity type of Group (see requirement Data-Group01) should be used.

Looking ahead to the Data Model, the simplest solution would therefore appear to be a pointer on person X's record, pointing to person Y, with a phrase that links the two in a defined order e.g. "master of" should be read as "X is master of Y". The documentation must note that any inverse relationship must be EXPLICITLY created.
ttwetmore 2011-02-27T08:39:33-08:00
Well put, Adrian. I should have put the post I just wrote on the event/evidence thread over here, as the major topic of my post was how to implement general relationships in a non-FAM world. For those following this thread I'd recommend you take a quick look at what I said over there. Adrian, in choosing the idea of a simple one-way, labelled pointer from one person to another, to represent a relationship, has picked one of the three ways I outlined, and he chose my favorite one. So, woo-hoo, Adrian.

Tom W.
AdrianB38 2011-02-27T08:30:58-08:00
Data-Ind02 Biological rel'ns indep of family
Original Req: "BetterGEDCOM must support the recording of biological relationships independent of any family grouping. Biological relationships must include surrogacy, etc."

Why: "Biological relationships can exist where there is no family in any meaningful sense. Existing GEDCOM files create a family for biological relationships. This is not always appropriate."
ttwetmore 2011-02-27T08:43:05-08:00
Adrian,

Very true.

Leading to the big question -- Is the truth of this statement sufficient justification to throw out the concept of the family as a Better GEDCOM entity type? A big question that has been looming like a dark cloud on the horizon since the creation of the Better GEDCOM effort (and for a long time before that as well).

Tom W.
AdrianB38 2011-02-27T08:46:50-08:00
Current GEDCOM (or 5.5 at least) provides only the Family as a means of recording biological parenthood. If we consider a family as a social grouping then there are 2 issues:
- Recording the biological parents of someone automatically creates a Family that simply may not exist as a social grouping.
- GEDCOM 5.5 seems to explicitly recognise only a biological family and an adopted one. (See CHIL - "The natural, adopted or sealed (LDS) child of a father and a mother." ) Thus, if the family historian wishes to record a _family_ created in part by an informal adoption or a fostering, the only mechanism currently available creates a biological family.

Requirement Data-Fam01 is designed to deal with the latter issue. Data-Ind02 is designed to deal with the former issue.
AdrianB38 2011-02-27T09:38:11-08:00
Suppose we have to record the biological relationships that result from some sort of surrogacy and that we wish to do it while satisfying this requirement.

Possible methods of doing this might include:
- retaining the current family structure in the data model but somehow hidden from normal view, so that it does not appear as a social family
- using the proposed, more general entity type of GROUP to record the people involved and their roles in the birth
- using the birth event in the proposed MULTI-PERSON event mode, recording the roles of the people in the birth
- using pointers from each individual to the others.

Method 1 - "retaining the current family structure ... hidden from normal view" feels like a kludge in that it does not satisfy any of the requirement in the Model but relies on the application. Further, the existing Family is based on 2 parents, theoretically 1 male, 1 female. This is inadequate to cover cases where 2 women are involved (say) - e.g. egg-donor and birth-mother who carries the child. To record these roles, we would have to alter the Family structure, defeating the point of attempting to reduce our workload. Personally, I would therefore reject this.

Method 2 "using the ... entity type of GROUP to record the people involved and their roles in the birth" This gives full flexibility to record the roles. However, any conventional family needs to have their biology recorded. We could assume that the conventional family gives the biology unless a biological group were created. This seems untidy and results in 2 ways of recording biology, which is not nice. Or we could create a biological group for all biological families, including the conventional ones, which also have their social families.

Method 3 - a MULTI-PERSON birth event, recording the roles of the people in the birth. This would seem to give rise to no new records - the child in a surrogacy case should have a birth event (though currently only a single person event, contained within their own record). There may be a concern that when reporting on the social grouping of the Family, then it may take a longer time to access the person's birth event. However, the explicit answer should be there, easing the answer to the question "Are you my Mummy?" (Reference to Steven Moffat's Doctor Who story "The Empty Child" quite deliberate!)

Method 4 "using pointers from each individual to the others" also provides a swift answer to the question "Are you my Mummy?". However, I am less keen on this as it ignores the new concept of the role-in-a-multi-person-event in favour of a possibly uncontrolled phrase (remember, my view is that the pointer must not assume reverse relationships, resulting in double the number of pointers). It becomes more difficult to establish all parties to a surrogate (e.g.) birth as the data is not all on one event, but requires navigation. Finally, the link between event and pointer does not exist, possibly leading to difficult navigation from a birth-mother? It does have the advantage of not depending on the existence of multi-person-events.

My personal preference, therefore, is that IF BG has multi-person-events, then we should go for method 3, the MULTI-PERSON birth event, recording the roles of the people in the birth. And leave the existing Family record to contain solely social groups.
AdrianB38 2011-02-27T09:42:26-08:00
NB - I have not considered relative workloads on current application providers of the 4 methods above. Partly because I can't yet get my head around what the minimalist implementation of a multi-person event might be.
louiskessler 2011-02-27T11:56:53-08:00

Regarding this topic, make sure you read Tamura Jones' article: "Researching Biological Genealogy" http://www.tamurajones.net/ResearchingBiologicalGenealogy.xhtml
AdrianB38 2011-02-27T13:26:42-08:00
Interesting - Tamura might then debate my use of "biological".

Trying to quickly think through some things that spring to mind after reading the link...

If one did disprove that person X was the biological father of Y, then X _could_ be knocked out of the birth event but retained in the family-group (assuming we keep it), as that is intended to be the family-as-a-social-group.
AdrianB38 2011-02-27T08:48:55-08:00
Data-Fam01 Family as a Social Grouping
Req't: BetterGEDCOM must support the recording of genealogy / family history data about the family as a (possibly informal) social grouping, independent of any biological relationship or legal adoptions

Why: Family units exist where there is no underlying biological relationship and no legal adoptions.
Biological relationships exist where there is no family in any meaningful sense.
Existing GEDCOM files may contain data (possibly user-defined tags) recorded about the social grouping of the family, which must be carried forward on conversion to BetterGEDCOM format.
AdrianB38 2011-02-27T08:55:15-08:00
Current GEDCOM (or 5.5 at least) provides the Family _only_ as a means of recording biological or adoptive parenthood. (See CHIL - "The natural, adopted or sealed (LDS) child of a father and a mother." )

If we consider a family as a social grouping then there are 2 issues:
- Recording the biological parents of someone automatically creates a Family that simply may not exist as a social grouping.
- GEDCOM 5.5 seems to explicitly recognise only a biological family and an adopted one. Thus, if the family historian wishes to record a _family_ created in part by an informal adoption or a fostering, the only mechanism currently available creates a biological family.

Requirement Data-Fam01 is designed to deal with the latter issue. Data-Ind02 is designed to deal with the former issue.
AdrianB38 2011-02-27T14:57:07-08:00
Suppose we separate out the social constructs of the Family from the biological, as on this requirement, how might we do it?

Possible ways include:
- dispense with Family as an entity type because it can all be done with events and attributes (in its current GEDCOM sense) - and perhaps relationship pointers inside one individual pointing to another;
- retain the existing GEDCOM family structure in the data model;
- dispense with Family as an entity type but use the new Group entity type instead;
- make Family a sub-type of the new Group entity type instead;

Method 1 - "dispense with Family as an entity type because it can all be done elsewhere". Firstly there is a huge psychological problem with this. Secondly, I know I have lots of events and attributes that apply to the family as a whole. Where would their data go? They could be split over the family members but a lot of perspective will get lost if that happens. Take a family resident at a location while various members go off and live elsewhere for a while. The family, as a single social unit, really does stay in one spot - but since no individual does, where would we get to know this? Also, the multiple methods needed to detect family members could get quite complex - birth events, adoption events, fostering via ?, informal adoption through relationship pointers ... I therefore discard that option.

Method 2 - the status quo. Except it isn't quite the status quo because I assume the biological data is held elsewhere, i.e. DataInd02 is successfully implemented. Has many advantages, not least in workload but IF Group is implemented as an entity type, then there isn't much difference between a generic group and a family - except that the roles within the family are different from the roles within a regiment but otherwise, in data modelling terms, they aren't far apart. So why make software developers code up Group as something separate?

Method 3 - dispense with Family as an entity type but use the new Group entity type instead. This has the psychological problem again.

Method 4 - make Family a sub-type of the new Group entity type instead. This would help the developers - they'd just make Group as a more general version of Family, and as Family inherits stuff from Group, there wouldn't (unlike method 2) be any duplication. Anything that the users could do with a group would be do-able with a family - unless the software developers saw good reason to not allow a generic Group thing for the particular.

Thus, Method 4 seems sensible BUT this depends on the Group entity type being there and the "biology" being elsewhere.
AdrianB38 2011-02-27T15:04:04-08:00
If "biology" is elsewhere, this frees the family as a social group to contain all sorts of combinations. An unanswered question is what roles the individuals would have

Parent and child seem reasonable for many uses - they could be used in an INFORMAL sense for birth children, adoptive children, foster children, informally adopted children, step-children who live in the family (as distinct from those who don't).

There are families where an Aunty lives with the family - she could be entered as a "parent" - though, perhaps it might be better to put in the roles as adult and child in that case.

Ideally, I don't want the roles to duplicate anything held elsewhere - I just want them to describe how the person relates to the social group.
GeneJ 2011-03-12T08:49:26-08:00
I want this functionality. It does, potentially, commingle concepts of the family that might conflict with some software genealogical numbering systems and common genealogical reporting (descendant/ancestral).

Where do we work together to sort out the conflicts by identifying related logical requirements. (So that vendors canimplement Data-Fam01 without breaking the use of genealogical data.)

On the surface:

(a) Declaring the relationships so the submitters intent is understood by the receiver. Ala, is it FYI (intended as part of life story reporting) or also family unit/lineage reporting?

(b) When a family unit/lineage intent is declared, the concept of primary seems also needed. How else will software know which relationship drives the full reporting and which relationship takes the form of a cross reference? In the example below, we don't want the many descendant generations from Tom and Cynthia to be reported over and over and over again.*

Example:

(1) What if ... Tom Smith is the biological child of Arthur Smith and Julia Jones.

(2) And separately, what if also ... His mother Julia dies, and Arthur Smith marrieds her sister, Sister (Jones) Lake. Descendants of Tom Smith should have the option of reporting that Tom Smith (and all his siblings) are now commonly considered the step-children of Sister (Jones) (Lake) Frost.

(3) And separately, what if also ... Arthur and Sister (Jones) Smith die in a tragic fire. Tom and his three siblings survive. (3a) Tom's siblings become the wards of Arthur's parents, Clifford Smith, Sr. and Rose Wood. Users should be able to record these siblings had a ward-guardian relationship (I'll suggest there might be different terms users might apply to declare the relationship). 3(b) Tom goes to live with his step mother's first husband's brother. George Lake; George has several children of his own. Again, users should be able to describe this relationship between Tom and George, and George's children.

(4) And separately, what if also ... George dies young, and Tom moves in with the neighbors, Lester Moore and Susan Wilson. Lester and Susan Wilson never married, but they have two children.

(4) And separately, what if also ...Tom marries Lester and Susan's daughter, Cynthia Wilson-More. Tom and Cynthia have 4 children.


*Some software can have problems with eliminating this duplication even when the relationships are simple (ala, multiple lines of descent).
AdrianB38 2011-03-12T14:00:25-08:00
"genealogical numbering systems" - that's a good point. But genealogy then has to decide - what grouping does the numbering system work off? The social family or the biological unit defined in Data-Ind02?

(This probably means Data-Fam01 and Data-Ind02 are mutually dependent.)

"common genealogical reporting (descendant / ancestral)" - I think a similar response applies here, except that it becomes perfectly possible to concoct reports for each of biological or social mode.

In this last case, we need the concept anyway, even with GEDCOM as it stands, since it's by no means certain which line an adopted child wants to trace. Some will want to trace only the biological, some only their adoptive line (because that's their "real" parents' ancestry), and some both.

Re the multiple lines of descent (or ascent?) issue. I'm sympathetic to the idea of marking up a "most important", but I wonder if this isn't trying to fix (potential) application program issues in the data without knowing the exact issue for the exact program. In other words - I'm not wholly sure I can make a best guess solution.
GeneJ 2011-03-12T16:38:48-08:00
:)

"But genealogy then has to decide"

As BetterGEDCOM, hope we will review the considerations, weight the issues, etc. From what I know on the topics, standards seem ahead of software on these topics.

In this last case, we need the concept anyway, "even with GEDCOM as it stands, since it's by no means certain which line an adopted child wants to trace. Some will want to trace only the biological, some only their adoptive line (because that's their "real" parents' ancestry), and some both."

I could not agree more.

Re: Multiple lines of descent.
It does tie pack to the numbering system. It's that concept that a person was "one" person. As with the example I posted above, dear Old Tom is related to a whole bunch of folks, but he's still just one person who had one, albeit large, "family." In a four generation descendancy, you don't want to number him and report his biography 8 times, but with in the context of each relationship, you want him reported. Although perhaps US centric, the current standard way of doing this with biological relationship is covered in the _Quarterly_ style guide. See Curran, Joan F., Madilyn C. Crane, and John H. Wray, edited by Elizabeth Shown Mills, _Numbering Your Genealogy: Basic Systems, Complex Families, and International Kin_, rev. ed. Washington: NGS, 2008.
GeneJ 2011-03-12T16:40:51-08:00
Err... I meant to write, "the current standard way of considering adoption, etc. in what is otherwise considered to biological reporting is covered in the ..."
gthorud 2011-03-12T17:39:49-08:00
I think Adrian's Method 4 is the one to choose.

Since the examples that Gene mention are real world cases, the reporting issues must be handled by programs - it is not impossible and should not affect the data.

Adrian mentions the aunt that lives with the family. Could the term "household" be a "super family" - eg. as seen in sencuses (at least here). There could also be other, non related persons, living with the family.
GeneJ 2011-03-12T21:17:22-08:00
Hi Geir:

"Since the examples that Gene mention are real world cases, the reporting issues must be handled by programs - it is not impossible and should not affect the data."

I didn't follow.

What does not affect the data?

Separately, there was a lengthy discussion on the TMG-L recently involving concepts and issues of some modern family structures.

See: http://archiver.rootsweb.ancestry.com/th/read/tmg/2011-01/1294503048
gthorud 2011-03-07T11:40:15-08:00
Administration01 - Research Administration Information
Overall requirement: BetterGEDCOM must allow recording of administrative information needed to organise the research work.

Please add your ideas, requirements, pointers to examples from programs etc here. Just a short overall descriptions so we can get a better understanding of the area. Detailed requirements will be added to the Requirements Catalog after that.

The Gentech model covers this type of information.
gthorud 2011-03-08T09:59:45-08:00
I have had a quick look at the Gentech model which contains some entities related to research administration. If you look at the diagram accompanying the main document, you should get a rough understanding of the entities.

See info about the model here http://bettergedcom.wikispaces.com/GenTech+Data+Model

I will try to give a very rough summary of the main entities, but it is very likely that I have not understood everything correctly since I cannot spend a lot of time on it. The whole thing refers to a model of the research process that is also documented in Gentech.

The entities are:

- Objective: A Research Objective can be for example “Find the father of John Smith”. It can have a name, description, priority, sequence and status.

- Project: Research Objectives can be grouped and linked to a Project, for example “Find the ancestors of Peter Smith”. It can have a name, description and client data.

- Activities: Each Research Objective can have several Research Activities, that can be a Search or an Administrative Task. An activity can relate to several Objectives. An Activity can record can have scheduling info, status, type, description, priority, comment and a link to a Researcher.

- Source: A Search can be linked to a Source (and a source can have several Searches linked to it). (A search can link to a Repository Source and a Repository and have a status “finished”.)

- Repository: A Repository can store several Repository Sources, e.g. a Book. (There is a hierarchical model for Sources which I will not describe here.)

- Source Group: A Source Group could be used to Administrate (manage) Sources, e.g. sources related to a particular topic e.g. “Sources about Boston”.

- Researcher: A Researcher can be linked to several Projects and several Activities. Name and “contact info” is stored for the Researcher.

- (A project can have several Surety-Schemes used for Assertions.)


In summary, important things to note is the structuring of a To-do-List into Activities (Searches), Objectives and Projects which may be linked to Researchers. And there is administration of sources.

I note that there are bits and pieces of information in programs that are not mentioned in this model.

It seems to me that the full model tries to cover the needs of a professional researcher. Do we need to cater for the needs of professional researchers, and is this useful to them?

A question is if the terminology of this model can be used as a common reference terminology? But I would not be surprised if there are other terminology standards in this area.

The model could be implemented in several ways, so I will not go into that.

Assuming that the main use of Administrative info in BG is to exchange info between one user’s programs, a thing to consider is how information in this model can be converted into programs implementing simpler models, and what the consequences are if some of the info is discarded. If a conversion is possible, it should be possible to use programs with little or much functionality/data for administration.
gthorud 2011-03-09T08:12:20-08:00
I have looked at RootsMagic.

There can be a Todo-List of Tasks for every Person or Couple (family), and General tasks not attached to any of these. All Tasks can be listed in ToDo list.

Each Task has a name/description, a Personal file number, priority, status, dates, link to an address or repository and a large field for Description and Results which can be formatted.

Could not find any additional filtering, sorting or any links to events, and I don’t see any way to list only the Tasks for a Repository.

Again, I am not sure I have found all the functionality.

Geir
gthorud 2011-03-09T10:26:28-08:00
I have looked at Legacy (DeLux). I have used a translated help file which could lead to errors.

There is a To-Do list with To-Dos (Tasks, I’ll use that term) for Persons and General tasks.

Each task can have a name/short description, category (select from configurable list), locality (Where to perform task, select from configurable list), dates, status, type (research/correspondance/other), priority, citations (which is in itself a general structure, that can eg. be linked to repository) , file ID (for filing cabinet), description (full. formatted text), result (formatted). Each Task can be linked to a repository/address with contact info.

The list can be filtered on many of the task fields and can be sorted on multiple fields.

I could not find a way to list tasks for a repository, but then there is “locality”.

Again, I am not sure I have found all the functionality.

At least one more prog to go ….
gthorud 2011-03-09T15:35:31-08:00
I have looked at Genbox. It seems to implement most of the Gentech model, and more, related to administration.

Searches (Gentech term) is the term used for Tasks (I will use Tasks). Tasks are scheduled after first defining a Target (Gentech term = Objective). Targets can be linked to (sub) Projects. Projects can be split into a hierarchy of sub projects.

There is a correspondence log (phones, letters, email etc) where each item can be linked to (sub) projects. Researchers (e.g. you or cooperating researchers) can be registered with contact information and is identified in eg. assertions.

Targets can be defined for a lot of information types eg. persons, events, families, person names, parents, sources, places and sources. There can also be general targets.

Content of the various records:

- Tasks (Searches) can have description, dates, priority, location, findings and can be linked to a source and repository.

- Targets have a description and are linked to searches and the information types mentioned above.

- Projects and sub projects can have name, dates, status, priority, number of hours used, completion grade (%) and description. They may link to higher level projects, targets and correspondence log items.

- Items in the correspondence log can have type (call, email etc), in/out, researcher, correspondent, subject, date, ref to filing system and details about the correspondence. There is also contact info (addr, phone etc).

- Researchers can have name, languages, registration number (?), notes, media (photo) and contact info. A researcher can be linked to a person in the database.

Lists and reports can be created for all main records, including Projects, Researchers, Targets and Tasks, and they can all be filtered and sorted on various criteria.
GeneJ 2011-03-09T17:39:12-08:00
This date I reviewed the FamilySearch Wiki for "Keeping a Research Log."
https://wiki.familysearch.org/en/Keeping_a_Research_Log

Note: This is a log that keeps track both of what you plan to search AND what you have searched.


(1) Objectives.
*Keep track of and be able to share what you have searched (helps avoid duplicate effort)
*Log provides a record of what you have done if you have to return to a source for further work.
*Provides a more complete record of your work.

(2) What is recorded in the log BEFORE you search:
*Name of Ancestor
*Research Objective
*Source Title, call number, microfilm number, book number, etc.
*Place where the source is available

(3) What is recorded after [or during] the search:
*Date(s) you searched
*Notes about what you found/learned
*Notes about what you didn't find
*Whether you made a copy
GeneJ 2011-03-09T17:55:24-08:00
Err... there is a related Wiki at FamilySearch, so I'll summarize it, too.

https://wiki.familysearch.org/en/Research_Logs

Part I
Value:
Cite your sources; sort out what has and has not been found; organize and correlate copies of documents; weigh evidence/better conclusions; show strategies and record questions; reduce duplication of effort

Contents (says, "following elements work well for most researchers")
*Ancestors Name/life span
*Researcher's name
*Date of search
*Place of Search
*Purpose (objective) of search (event and person)
*Source Description [they show the call number and Document numbr separately, I just lumped that in with source]
*Results - a summary of what was found

Part II
What to complete in anticipation of a search:
*Date
*Place of Research
*Purpose
*Source Description

What to complete after a search:
[This wiki has comments, but the comments seemed a little jumbled. Near the bottom of the wiki is says, "Write lots of notes to yourself explaining your strategies, analysis, conclusions, questions suggestions, and discrepancies. "


GJNote: This form of research log, in software or electronic form, can be used to record snippets or full transcriptions from sources, together with your own comments and notes.
gthorud 2011-03-10T06:58:43-08:00
It is easy to see that, as Adrian has pointed out, there is a big span in the functionality implemented in various programs. It will be difficult to convert all information from the most advanced programs into the programs that provides minimum functionality. However, it seems useful that when a user changes from a complex program to a simpler or similar program, it would be better to get some of the information across and perhaps in a non-optimal structure, rather than the current situation where no administrative information can be transferred. Although one should try to prevent it, it would be acceptable to loose some types of information and the user may even have to do some tidying up after import (?). Since there is usually only one user involved, requirements to preservation of the exact meaning and structure of the information is less important than if the information was transferred between different users.

There are differences in the overall structure of the information in programs, from a simple task list to a task-objective/target-project model with many entities. Thus types of information may be attached to different entities in the structure, but in many cases meaning the same thing. Also there are some programs that have a one to one relation between entities, where others have a one to many or many to many.

One observation that might be helpful in a conversion process is that all? programs have one or more large text fields, and all? programs have Tasks. Information from highly specified fields or higher level structures could be converted into these text fields. The structural incompatibilities could be solved by “flattening” the structure (a term used in data conversion). For example consider a program that has the Objective “Find my grandfather’s birth date” and two tasks “Check census x” and “Check the parish records for y”. This could be converted into the text fields of two tasks:

Task 1:
Objective: “Find my grandfather’s birth date”
Task: “Check census x”
Task description: Find xxxx

Task 2:
Objective: “Find my grandfather’s birth date”
Task: “Check the parish records for y”
Task description: Find yyyy


The example can be extended. Assume that the Objective is linked to the record of my grandfather Ole Olsen, and to repository X.

Task 1:
Person: Ole Olsen (ref # 1234)
Objective: “Find my grandfather’s birth date”
Task: “Check census x”
Repository: X

Important info could be placed at the top and less important info at the bottom of the text field. The exact positioning must be determined by the implementer of the program (but could in THEORY be user configurable). The implementer will also have to balance the complexity of the note against discarding information.

The “flattening” may also handle eg. three levels into one, or project and activities into activities only.

Such a conversion must be done by the importing program since it is the one that knows the data file structure and it’s internal structure. The complex structure can therefore be represented in the BG file.


I will try to look into grouping of the program functionalities into sub-requirements, but it will not be easy, and may take some time.
GeneJ 2011-03-10T09:44:57-08:00
How can I help?
GeneJ 2011-03-10T10:04:58-08:00
Putting a pitch in for Research Administrative support at the source level.

Task Name: FHL Film 634021, "Anywheresville Birth Records, 1650-1910."
Task Description: Access filmed records
Description: Subject film cited as source of the source for Ancestry's database "Index to Anywheresville Births."
Notes/Comments: XXXXXXXXXXX
gthorud 2011-03-13T12:43:00-07:00
I have gone through the functionality in the programs etc. mentioned above and tried to group it into Reuirements. Thr important thing in this step is to capture all the possibilities, so one question is – Have I forgotten anything important? Although I have tried not to prioritize, there are some differences in the wording indicating a priority in some cases. I have followed the Gentech model. The plan is to copy these requirements to tha req. cat.

Comments are welcome.

Research Task
BetterGEDCOM shall be able to record a Task (search or other task) that needs to be done or has been done. Information recorded about the task itself could be a Title/Short description, a full description (formatable). Research tasks can be organized in simple lists or grouped into Objectives, see below.

Task information
BetterGEDCOM shall be able to record information about a Task, for example used for Categorisation (keyword, category, type (research/correspondence/other)), Progress management (priority, staus, dates. comments about dates), Resource use (Expences, number of hours used)

Identification of persons, events, places that the task is about
BetterGEDCOM shall be able to link a task to records representing the person(s), event(s), place(s), source(s) etc. that the task is about, existing when the task is defined (started). A possibility is also to record links to persons, events etc. that are created as a result of the task.

What to search
BetterGEDCOM shall be able to record information about, or link to records representing, WHAT to search – e.g. a source. Possibly an URL pointing to the source.

Where to do the task
BetterGEDCOM shall be able to record information about, or link to records representing, WHERE to do the task – Location name (if not linked to), Repository, Place (eg. cemetery), Address

Task results
BetterGEDCOM shall be able to record information about, or link to records representing, the findings and results produced by the task (an overall description of the results, Excerpts, Multimedia, Citations, Filing Cabinet Reference)

Objectives for grouping of tasks
BetterGEDCOM should be able to group several tasks into Objectives (Target) , each Objective representing a question to be answered or a problem to be solved. An objective is usually defined before the tasks needed to achieve the objective. Objectives should have a description and will be the record pointing to users, events, places etc rather than each task. Some elements of the information recorded for tasks (see above) can be defined for the objective rather than each task,


Projects for grouping of objectives
BetterGEDCOM could be able to group several objectives into projects. Projects could be split into sub-projects. Each (sub-)project should have a name, elements of task progress listed above, completion grade (%) and description.

Correspondence log
BetterGEDCOM could be able to record information about letters, emails, phone calls or other correspondence related to the research. Item in the log can have a type (call, email etc), direction (in/out), researcher, correspondent, subject, date, reference to filing system and details about the correspondence. Contact information (address, phone etc) could also be recorded..

Researchers
BetterGEDCOM could be able to record information about the researchers using the program or other cooperating/corresponding researchers. Researchers can have a name, languages, registration number (?), notes, media (photo) and contact info. A researcher can be linked to a person in the database. The Gentech model also links researchers to assertions, i.e. who made the assertion.
GeneJ 2011-03-13T12:58:28-07:00
It looks great. Thank you for this. --GJ
AdrianB38 2011-04-11T08:52:49-07:00
Please note my new page on
http://bettergedcom.wikispaces.com/Research+Process%2C+Evidence+%26+GPS

This endeavours to concoct a research process and, in the process, defines the data that could be stored in a BG compatible database.
theKiwi 2011-03-07T12:24:45-08:00
The screen shot I mentioned during this afternoon's meeting can be found here

http://lisaandroger.com/MiscImages/ReunionLogs.png

It is from Reunion for Macintosh, in which "Logs" have been a feature for many versions up to the current Reunion 9.0c.

The Logs are stored in the Reunion database, so each datafile can contain its own set of Logs.

I can create as many different Logs as I want to using the "Add Log" button

The other buttons provide access to other features as named, including for example to export a Log to a word processor file, or a text file, and perform a search to find text in any Log.

The text of the Logs can be formatted using the buttons at top right - Plain Text, Bold, Italic, Underlined, and coloured.

Nothing stored here in this part of the Reunion database can be exported to a GEDCOM file.

A very minimum requirement of BetterGEDCOM would be to support these simple text data chunks, presumably as type NOTE.

An extra would be to allow the passage of text formatting as has been mentioned in other places in the Wiki, using for example HTML tags to style the text, although this is a function of the exporting application in reality.

Further enhancements might be to allow for a set of structured fields for these Logs, so that you might record in separate field more pertinent information such as

URL to use for doing the listed research
Name and Address of Research facility
etc

so that perhaps a structure similar to Sources might be developed/supported?

This would allow if the genealogy application supported it, one could get a list of all logs that have work to be done at "Library of Michigan" for example if you created more than one Log for that, although as my screen shot shows, in my (admittedly not very rigorous) use of this I've created different log for different sites and listed all the items I was thinking of at that time in there.
louiskessler 2011-03-07T13:44:13-08:00
I believe that ToDo lists should be Repository-based. Every item should be assigned to one or more repositories. That's how I'll eventually implement them in Behold.

Every trip, you'll know which repositories you're going to, the people you'll be visiting, the cemeteries you'll be exploring, etc. So you will want a list of what to do at each of these places organized by place.
theKiwi 2011-03-07T13:57:37-08:00
  • Every trip, you'll know which repositories you're going to, the people you'll be visiting, the cemeteries you'll be exploring, etc. So you will want a list of what to do at each of these places organized by place.

I don't think this is necessarily true - a To Do list item could be as simple as

"Find William Moffat's parents", the solution to which might be found at the first repository you think to check (if you're lucky), but which might not be found until the 5th place you think to look, and a lot of time can be solved without going anywhere outside of "the internet".

Or as Adrian notes, these lists can be used for other things, like a Log of all of the people with the same surname in the same village that you'd like to further research to find out if they're related, or at the least a note to yourself to remind yourself to investigate this at some stage.
louiskessler 2011-03-07T15:10:40-08:00
Roger,

So I've got 200 things to do. I've got them all on my written ToDo list that is 8 pages long. I'll go to the store and take the 8 pages. I'll search through the 200 things for the 20 things I need to buy at the store. I'll do that every aisle I go down. I'll read those 200 things 40 times. When I'm done, I'm still not sure I got everything.

Now I'll go to the Post Office. Find the two things I need to do on the list. But I forget to go to the drug store that is right next door, because I didn't think of it because I was thinking of the other 170 items I needed to do.

The bottom line is, you ABSOLUTELY POSITIVELY NEED to organize your tasks by where you need to do them.

Why not just add your 300 genealogy things to do to the the other 8 pages of 200 physical things to do. Then you'll have 500 things to look through every time you do any errands.

I shouldn't give away my trade secrets. I can't believe nobody's ever seriously thought about this before. :-)

Louis
gthorud 2011-03-07T15:55:37-08:00
Although I don't mind discussions, it might save you some work if you wait with the discussions on specific functionality until we have created some more detailed requirements. I try to see the larger picture of possibilities before separating them into more detailed requirements (and possibilities). Keep posting suggested functionality - VERY detailed aspects may not be needed at this stage - but should be presented later.
GeneJ 2011-03-07T21:37:43-08:00
Hiya -

I uploaded screen shots of the _The Master Genealogist_ Research Log input screens.

http://bettergedcom.wikispaces.com/TMG+Research+Log+Images
GeneJ 2011-03-07T21:51:52-08:00
Also uploaded screen shots of plan/tasks input screens for Family Tree Maker for Mac.

http://bettergedcom.wikispaces.com/FTM+%28Mac%29+Task+Input+images

Hope this helps. --GJ
AdrianB38 2011-03-08T02:55:06-08:00
Let me for once make a suggestion that we need to think what we _don't_ require. It's quite clear that people prefer to organise their to-do lists on a very personal basis. The question is - how much does _another_ researcher need to know about my to-do lists?

They don't, I suggest, need to know about my planned visit to Bristol that will include visits to X, Y, Z, and the tasks at each of those places. Hence, why should it be in a BG file?

What might very well be useful to them (and therefore in the BG file) is the precise objective of those tasks - e.g. investigate if X is my 4G grandfather of the same name.

For this reason, I'm kind of underwhelmed by the prospect of to-do lists in any sophisticated form. We can come to an agreement (I hope) about the data model for genealogy - I'm far from clear we can agree more than some very basics for to-do lists (and a lot more besides).

Our criteria should be - I suggest - what might _other_ people find of use?
theKiwi 2011-03-08T04:09:23-08:00
@Adrian - the ability to move EVERYTHING is very important to individuals wanting to transport their data to themselves.

Over the years I have looked at other (non Reunion) Macintosh software as it comes along, but have immediately lost interest when I find that the error log of items not imported runs to (many) thousands of lines.

So at least sometimes, we are our own "other" people.
theKiwi 2011-03-08T04:13:18-08:00
I think I've figured out how to include an image here now...

ReunionLogs.png
gthorud 2011-03-08T06:46:20-08:00
I have looked at TMG. I have never used these features so what follows could be inaccurate.

There is a Research log which is an advanced ToDo-list. Each Task in the list has a Task name, various dates related to the progress status of the task, for each date there can be a remark, keywords, a value for expenses and a larger description field which can have various types of formatting.

Each task can be linked to zero or one of the following: Person, Event, Source and Repository. General tasks are not linked to any of these. Links to the log are available on each of the mentioned entities, so you can enter a task for eg. the current event, and see if there are any uncompleted tasks registered for the event. You can find the tasks that need lookup in a source or repository.

You can filter the log on a specific entity, so you can see the tasks related to a person, a repository, source, event, surname, keyword, task name or location (I am not sure how location works) and a list of tasks can be sorted on various criteria. Depending on how you organize the use keywords it could be used to group tasks for example related to a specific problem or area of work, a fellow researcher or whatever.

There are also reports for Task and the other related entities, I have not checked them out.

Hope I have not missed to much …

Geir
AdrianB38 2011-03-08T08:42:12-08:00
Kiwi - moving your own data between different apps is clearly important and a good answer to my question about how much does "someone else" need to know.

However, I fear (rightly or perhaps wrongly) that the likelihood of 2 apps sharing the same data model for non-trivial stuff outside the pure documentation of genealogy (i.e. scope of GEDCOM now) is slim.

Your Log doesn't look particularly sophisticated (sorry!) - a name and some formatted text, so that might very well export and import OK. But Geir's TMG Research task is an altogether different beast. It's perfectly possible to envisage TMG exporting all that into a BG file, either into tags that BG define or into custom tags. In fact, I'd say that exporting it into custom tags would be easy in XML or JSON. (Yes, I am making a wild statement but if I can do a generalised export from a 1970s technology mainframe, surely you PC guys... <grin>) But when it came to importing the TMG Research log into Reunion... Is Reunion going to implement TMG data structures? I think not. The non-genealogy stuff is how they differentiate their products, so I'm not optimistic we'll even find out what's behind the scenes.

Let's not get too pessimistic here - we can collect info about what people's software does, as Geir requests, and see if there's a simple basis that's common across a number of programs. But I think we need to remain a bit grounded in our expectations in this area.
AdrianB38 2011-03-07T12:20:12-08:00
FamilyHistorian enables me to create annotated lists where the lists can consist of any entities of any entity type. These can be used as hit-lists for to-dos; lists of canal boatmen (i.e. people with some common theme); lists of people who I know aren't relatives but who are in my Database because they were associated with my relatives (just saves me trying to find a family link when I've forgotten why they're there); lists of people to construct reports and diagrams on.
GeneJ 2011-03-12T06:35:47-08:00
Source 02-Certainty Assessment (QUAY)
Certainty Assessment (QUAY)

BetterGEDCOM should record the qualitative degree of likelihood that a source is true for a given event or characteristic.

Importance: Very Desirable

Why? GEDCOM has QUAY (quality) for this but the GEDCOM Standard is not clear what QUAY value should be assigned to a Primary source of Questionable accuracy

Source: Discussion page on Shortcomings of GEDCOM

Please use this discussion thread for comments.
GeneJ 2011-03-12T06:39:58-08:00
I'm not a QUAY fan; however, GEDCOM's description (exclusive of comments relative to the specific references 0, 1, 2 and 3) isn't bad. It's those specific references that are not mutually exclusive and out-of-date.

Here's the information I've referenced from GEDCOM 5.5 (quoting):

CERTAINTY_ASSESSMENT (QUAY)
The QUAY tag's value ( 0 | 1 | 2 | 3 ) conveys the submitter's quantitative evaluation of the credibility of a piece of information, based upon its supporting evidence. Some systems use this feature to rank multiple conflicting opinions for display of most likely information first. It is not intended to eliminate the receiver's need to evaluate the evidence for themselves.

0 =Unreliable evidence or estimated data
1 =Questionable reliability of evidence (interviews, census, oral genealogies, or potential for bias for example, an autobiography)
2 =Secondary evidence, data officially recorded sometime after event
3 =Direct and primary evidence used, or by dominance of the evidence
GeneJ 2011-04-04T06:24:26-07:00
I recommend BetterGEDCOM's description read:

The QUAY tag's value ( 0 | 1 | 2 | 3) conveys the submitter's quantitative evaluation of particular evidence. Different applications and even different users may take different approaches to QUAY. Generally a lower numeric rating means the submitter places less reliance on the evidence. The originator's QUAY record does not eliminate the receiver's need to independently evaluate the evidence.
AdrianB38 2011-04-04T08:34:13-07:00
Provided Source01 is implemented to give the requisite detail, then Gene's suggestion above would seem sensible - it's a number, end of story.

NB - I still don't like the term "quantitative evaluation" - no quantities have been measured during the evaluation of this value! It's qualitative, even if it is written as a number. But then, I'm a mathematician.
GeneJ 2011-04-04T09:36:17-07:00
Let's see if we can take a bite out of this one:

QUAY is a value ( 0 | 1 | 2 | 3) representing an originator's qualitative assessment of how well particular evidence supports a particular assessment. Different applications, and even different users of the same application, may take different approaches to QUAY. Generally a lower numeric rating means the submitter places less reliance on the evidence. The originator's QUAY record does not eliminate the receiver's need to independently evaluate the evidence.
AdrianB38 2011-04-04T12:39:33-07:00
Looks OK to me
GeneJ 2011-04-04T13:11:02-07:00
I fashioned in one little changed, taking out reference to "support" and substituting "relative is ... to" might more inclusive to indirect and even negative evidence. --GJ


QUAY is a value ( 0 | 1 | 2 | 3) representing an originator's qualitative assessment of relative particular evidence is to a particular assessment. Different applications, and even different users of the same application, may take different approaches to QUAY. Generally a lower numeric rating means the submitter places less reliance on the evidence. The originator's QUAY record does not eliminate the receiver's need to independently evaluate the evidence.
AdrianB38 2011-04-04T13:28:14-07:00
Err - I think I know what you meant but would it be better as "of HOW RELEVANT particular evidence is to a particular assessment."?
GeneJ 2011-04-04T13:38:19-07:00
Yes!!

QUAY is a value ( 0 | 1 | 2 | 3) representing an originator's qualitative assessment of how relative particular evidence is to a particular assessment. Different applications, and even different users of the same application, may take different approaches to QUAY. Generally a lower numeric rating means the submitter places less reliance on the evidence. The originator's QUAY record does not eliminate the receiver's need to independently evaluate the evidence.
GeneJ 2011-04-04T16:45:19-07:00
QUAY is a value ( 0 | 1 | 2 | 3) representing an originator's qualitative assessment of how relative particular evidence is to a particular assertion. Different applications, and even different users of the same application, may take different approaches to QUAY. Generally a lower numeric rating means the submitter places less reliance on the evidence. The originator's QUAY record does not eliminate the receiver's need to independently evaluate the evidence.
GeneJ 2011-04-08T20:05:49-07:00
See the Application Data wiki page for FTMM. I observe FTMM's QUAY (in the application, "Rate Source Citation") works off a star rating from 0 (zero) to 4, or five (5) points on a scale.

There are really two scales--blue stars and gold stars. Blue stars are controlled directly by the user. The gold stars are calculated by the program based users answers to questions about Source [Form], Clarity, Information [Class] or Evidence [Type]. If the user overrides the program generated gold stars, they turn blue.
gthorud 2011-04-13T15:19:31-07:00
I posted the following in the discussion of Source01, but since some of it applies also to this topic, I post a copy here:

I personaly do not realy understand what some of these classifications is used for. Are they output in reports? In most cases I would know these classifications from knowing the source. But, leave that aside, as it seems that there is some interest in having classification schemes.

If we are going to have this in the standard, we should try to arrive at agreed values - leaving this to implementors or allowing several shemes will create chaos.

There have been some proposals above to break things down into several consepts that classify a single simple (atomic) aspect, I support that approach. The simple facts are less likely to change over time. How vendors map (combine) theese in the user interface, if at all, that is their problem.

We should NOT create schemes (ala QUAY) where an increasing value 0-1-2-3 indicates a better quality. We should allow a extensibility by allowing eg. a code value 5 to mean e.g. something betwen 1 and 2. The user should relate to a definition, not the code value.

But, we may need to keep QUAY for backwards compatibility - if it's values can not be maped to a more detailed multi value scheme - but then deprecate it's use when recording new data.
EssyGreen 2011-11-13T23:19:09-08:00
I do use the existing QUAY for all my citations and I do it so that at a quick glance at any fact/event I can see whether or not I have 'good' evidence for it (by 'good' I usually mean primary but even primary can be 'bad' and sometimes secondary is 'good'). So for example, I can look at a Birth event and scan the list of associated citations to see if they are all weak or if there is a strong one. If they are all weak then I need to look further.

Re the exact wording of the QUAY ... I think GeneJ's version is great. Nuff said, move on?
gthorud 2011-03-24T19:07:38-07:00
Data-Event02 - Multiple places per event
Description: BetterGEDCOM should support the recording of multiple places for a single event.

Why: Current GEDCOM allows the recording of one place for events. There are application extensions to record more than one - e.g. FamilyHistorian records two places for emigration - a "from" and a "to" place. Users may also define "Journey" events, where a "from" and a "to" location would seem natural.

Way forward:
•Analyse whether there is a need for more than two places per event - e.g. "from", "to", "via";
•Analyse whether location-roles are mandatory, optional or forbidden. (Location-roles refers to the role that a location plays in an event. Examples of roles are "from" and "to". Locations without roles would be just listed, e.g. "The 1906 earthquake happened at X and Y")
•If roles are needed - what are the roles?
gthorud 2011-03-24T19:19:32-07:00
There are some questions that must be answered regarding multiple places for events.

What are the event types where several places could be used?

So far emigration, immigration and earthquakes have been mentioned, but it also applies to all sorts of migration (i.e. “moved”), journeys or military expeditions. All of these may have one or more “via” places, in addition to starting point and destination.

Then there may be events used for places only. One such event would be when a place is created by splitting one place and creating a new one, or a (part of) one place merged with another. The most frequent variant would involve for example farms7properties at the same level in a place hierarchy, but would also apply when a place is “moved” from e.g. a country to another (higher level involved). There may be many such events for a place.

Some document could contain a place name that you are not sure where is, it could be this place or that place, so the event would reference both.

It might also be that multiple places could be used to reduce the amount of robot language, e.g. when person owns three properties, instead of having three sentences for this one could say that “Peter owned X, Y and Z.” But this is just a vague idea that I guess a lot of people might argue against.

There is a question if there are other event types where multiple places could be useful? Are there documents that mention several places?

Each place should have a role, e.g. starting point, destination, via-place, separated from etc. or the role could be implied by the event type if only one role type possible. The available roles for an event type must be defined for each type.

There is also a need to have dates for each place, e.g. arrival and departure day. And also sources? The order in which the places occur is obviously important.

It is also important to consider the implications for programs. All programs have only one place field per event, but they could have some sort of indicator that tells the user that there are more places - popups to see/edit all. Also the scripting language used to specify sentence templates must be enhanced. Relational databases might need an extra record type. How difficult will it be?

Since all? "person events” (but not all place events) with multiple places can be represented by single place events, although with more robot language in some cases, the question is if there is enough support for multiple places. I am sure this is an old Gedcom issue so there may be a lot of people with strong opinions.

Geir
theKiwi 2011-03-24T20:01:27-07:00
I'm not convinced that events need more than one place - to me an event is something that can be represented by a date and a place.

"Emigration" that takes 6 months can be broken down to be

Emigration from place1 on date 1
Immigration to place 2 on date2

what happened between date1 and date2 happened on a path, not a place (unless you want to somehow describe the 12,000 mile ocean voyage from England to New Zealand as a "place".

And the Emigration and Immigration will likely have 2 quite different sources - almost certainly from 2 different countries if it's a migration from one country to another.

Roger
AdrianB38 2011-03-25T05:19:51-07:00
Roger - I guess you're one of those to whom an event would only have a single date, whereas I'm one of those where an event could have a FROM-TO date range, e.g. "The First World War happened From 1914 To 1918" - OK not phrased in a genealogically relevant fashion, but it illustrates what most people think of as an Event. (Or at least - most English speakers would - I'm unsure whether the concept would translate).

(NB - this is NOT the discussion thread to debate this Event-From-To topic. Not sure if there is one in the Reqts Catalogue, but we've had such discussions elsewhere).

Anyway - at root, those of us who subscribe to the Event-lasting-more-than-one-day view are I suggest more likely to want a From and To location at the very least, as we would find it more natural. Certainly, you could describe your 6m emigration journey as you suggest. (By the way, I would describe "at sea between England and Australia" as a place - I've got someone allegedly born on the journey).

For me though, I'd like to describe "emigration" as an event from date1 to date2, from-location place1 to-location place2, and I'd quite happily have 2 (or more) sources for the journey, simply with a note against the "citation" that says this source is for the origin and this for the destination.

To me, this construct is far more natural, and comes out in reports far better - I did fight against using the 2nd (custom) location for emigration in my software but it just read better all in one. Hence my proposal for multiple locations for an event.

(I also have a prejudice against the "immigration" event as defined - what the heck does it mean? No-one in English-English uses the verb "to immigrate" and when I pass through "Immigration" at an air-port it's not because I'm going to live there permanently, so there's a danger the event will get misunderstood in future. However, that's not strictly relevant.)
AdrianB38 2011-03-25T05:35:37-07:00
Geir,
My personal view is that I don't see the need for "via" locations. Or more accurately, the benefit that they bring seems to be outweighed by the extra complexity of a possibly endless array of via-locations. (I can live with one via-location). But I've written it into the Requirements to have this discussion.

Location1 "or" Location2. Interesting. However, this could start a list - if or-places, why not or-dates? I think I'd prefer if we stuck to the convention that alternatives are done by putting in a 2nd event. (Should this be explicitly recorded in the Requirements Catalogue)

Dates for each place? Again, interesting but I think if we confine ourselves to just 2 locations per event then the necessity for dates for each place goes away as it's just the from-date and to-date that apply respectively to the from-place and to-place.

As for the programs, well, the software that I use does have space for 2 locations for emigration as it has a custom defined extra location. This is just a minor tweak to the form to show the extra location and would be an extra column in the database. An array of many locations (e.g. "from", "to" and multiple "via") would indeed be a different ball-game.
AdrianB38 2011-03-26T09:18:38-07:00
Having thought slightly more deeply about the data model for the Event and how it could be implement in GEDCOM, an RDBMS, XML, whatever, I now think my distaste above for an "array" of many locations (e.g. "from", "to" and multiple "via") was mistaken and that the extra complexity falls out of (a) the requirement to keep details about location history (specifically splits and joins) and (b) also out of an Event's need to record multiple persons. Once we have (a) and (b) designed in, then it becomes trivial to allow many more than 2 locations per event in data storage terms.

To make this clearer (I hope) - we would envisage event XML that looks something like this:
<Event id="EV1234" Type="some-type">
<Date>...</Date>
<People>
<Person id="IND4472">...</Person>
<Person id="IND4498">...</Person>
<Person id="IND41212">...</Person>
</People>
</Event>

Apologies for any mistakes and naiveties in XML. Note I haven't put anything in for location. Value of "id" is meant to be a cross-reference. The initial thought (1 location per event) gives me this:

<Event id="EV1234" Type="some-type">
<Date>...</Date>
<Location id="L4084"> ... </Location>
<People>
<Person id="IND4472">...</Person>
<Person id="IND4498">...</Person>
<Person id="IND41212">...</Person>
</People>
</Event>

We need multiple locations to cover the history of a location. It is possible to imagine a location being split into 3 or more at 1 event (e.g. dissolution of the USSR?), giving us:
<Event id="EV1234" Type="location-split">
<Date>...</Date>
<LocationList>
<Location id="L4030"> ... </Location>
<Location id="L4080"> ... </Location>
<Location id="L4084"> ... </Location>
</LocationList>
</Event>

Seems easy enough to me... And the <LocationList> element(?) can easily be used for the ordinary event to give multiple locations:
<Event id="EV1234" Type="journey">
<Date>...</Date>
<LocationList>
<Location id="L4030" Role="from"> ... </Location>
<Location id="L4080" Role="to"> ... </Location>
<Location id="L4084" Role="via"> ... </Location>
</LocationList>
<People>
<Person id="IND4472">...</Person>
<Person id="IND4498">...</Person>
<Person id="IND41212">...</Person>
</People>
</Event>

So - in this XML illustration, a list of locations is (a) necessary to record a location-split event, (b) easy because it's just like a list of people and (c) can therefore be used for any event.
gthorud 2011-03-27T16:21:29-07:00
I have always found the language produced by genealogy programs boring and unnatural. There are sentence after sentence starting with “Peter was …”, “He was….”, “He emigrated …” and so on. If I find it boring, I expect other readers to find it even more boring since they don’t know why. One reason for this situation is the simplicity of the current event structure. It seems that data structures are more important than the end product of our work.

It might be useful to look at some real world examples.

Considering that a lot of people emigrated from a port outside their country, and even transited through England (there was a small transit industry there) i think multiple places would make the language more natural. I would rather see “She emigrated to America from Oslo 12 may 1900 via England.” than “She emigrated from Oslo 12 May 1900. She went via England. She immigrated in America.” The single sentence could often be followed by “She immigrated at Ellis Island 29 May 1900 where it was recoded that she was destined for Coon Valley.” rather than two more sentences.

Emigration/Immigration is one thing, but there are also a lot of records of migration within a country. Currently there is really no way to say that someone moved from a to b.

Also, there are things related to property, for example “Hans bought a part of the Olson farm in 1875 and called it My Farm.” Sentences that rename a farm would be impossible without references to two farm names. And what about inheritance (probates) back in the 1600s when people owned bits and peaces of several farms, and it was often the case that someone inherited pieces in two or three farms. Or the simple fact that someone owned several properties, you don’t want a sentence for each of those.

So multiple places is an opportunity to get rid of some robot language. The limiting factor in this game is not the user interface or databases, but the construction of sentence templates since it must be approximately the same in both the exporting and importing application. The key will be the selection of appropriate roles for places, and the fact that information in sources is often “standardized”.

I have already mentioned events for places above, some of them can not be described without reference to more than one place, and I am sure there are cases where you can get rid of some robot language in such events also.



Regarding the maximum number or places, I think that 3 (maybe 4) places should cover most cases. But I think the limit will have to be investigated later.

I am not sure I understand why the case with the ambiguous places name must result in two different events. That would just mean more work and more robot language.

Also, there are currently many date and place pairs in user interfaces, so I don’t see why dates could not be grouped with places in a structure, you could just define a structure with repeatable pairs where both the date and place are optional.
AdrianB38 2011-03-28T07:49:51-07:00
The robotic nature of computer generated reports will probably never be defeated without transforming the method of input and data storage totally. Such a change is probably beyond the scope of creating BG as a successor to GEDCOM in today's programs.

However, we can surely help the applications along by making available a data structure closer to a sensible sentence. And Geir shows this well with the examples on Emigration etc and Moving. Nobody (at least in English) ever separates the start of the event from the end when creating a sentence. The events amenable to this sort of treatment can be recognised, I suggest, by the fact that they always come in pairs. Nobody ever emigrates without also arriving in the target country (what GEDCOM refers to as the Immigration event). So why separate the events in the database? You may not know the details at the "other" end, but any intelligent application can code round that when creating the narrative sentence.

So there's the challenge - what pairs (or trios?) exist?
theKiwi 2011-03-28T07:53:28-07:00
I'm still thinking about this in general, but one point in response to

  • Nobody ever emigrates without also arriving in the target country (what GEDCOM refers to as the Immigration event).

1 - People who emigrate and die at sea
2 - people born at sea during the voyage have never really emigrated from anywhere?

In both cases these are clearly 2 quite separate events

1- an emigration and a death
2 - a birth and then an immigration
AdrianB38 2011-03-28T11:54:58-07:00
"these are clearly 2 quite separate events"

That's true and a good reason to continue to allow separate events - but in that case one could simply omit the "missing" bit. For 99.9% of people then, the 2 events are described verbally as one, so would be more useful entered as one. If you wanted to.

(Of course, we could argue pointlessly for days about whether (2) really is an immigration or an arrival!)
AdrianB38 2011-03-25T08:20:31-07:00
Data-Event04 Events over a time-period
This has been raised to act as a home for any further discussions about whether or not events should be able to last over more than 1 day.

Note earlier discussions on this topic in discussion "Syntax09 Define Event vs. Attribute"
AdrianB38 2011-11-26T12:01:29-08:00
"a good example yet that required more than one location for an event that doesn't have a duration"

What about some natural disaster? I know from letters loosely connected to my family that the San Francisco Earthquake of 1906 affected both SF and Oakland (and no doubt many other places as well...). As the family were in both places, I'd like to at least consider the possibility of associating them with the 'quake and describing it as occurring in at least those 2 spots seems desirable. (Plus that well known place "et cetera"?)
AdrianB38 2011-11-26T12:14:51-08:00
"I represent the two end-points as independent events ... but link one to the other. That effectively defines a "protracted event" that has a duration"

We had an earlier discussion about "What IS an event?" and my favourite definition at the end was something like "It's a change of state compared to what went before." It's then quite possible to imagine something like WW1 as one event lasting from 1914 to 1918 (a temporary change from the state of peace to one of war) OR as one event in 1914 to cover the outbreak (a change from the state of peace to one of war), a different state after that that is NOT explicitly referenced (well, I wouldn't reference it) and then a second event to cover the end in 1918 (a change from the state of war to one of peace). Again, I'd not thought of linking the two but it does make sense.

As for whether the best plan is one long event or two end events, I think I'd like to reserve my decision on a case-by-case basis and see how the other data tends to play out in a particular application. Mathematically, the two approaches seem equivalent at first glance.

"It also allows the two end-points to have independent locations as in the case of a long journey." That's true but I'd be slightly concerned you might be putting the cart before the horse there - if you have a need for multiple locations, then why not satisfy it, rather than succumb to a restriction from the start?
ACProctor 2011-11-26T12:36:37-08:00
Re: San Francisco earth-quake having multiple locations...

My format solves this through inheritance. A generic event represents the overall quake and derivations of it can specify a more precise location. Other information such as the actual date are inherited from the generic event.

I use the same mechanism for census events. A generic one represents the complete census on a particular night in the UK while derivations specify a particular household etc.
ttwetmore 2011-11-26T23:45:49-08:00
"That sort of flexibility sounds like it would make subsequent processing more difficult. Having a simpler 'event' concept that is well-defined and atomic allows you build concepts such as a 'protracted event' or a 'structured event' without losing the simplicity of using it as a marker in time. I haven't come across a good example yet that required more than one location for an event that doesn't have a duration."

Well, the flexibility allows the simple approach you suggest, for which processing is trivial, so the DeadEnds model is fine. Plus, as a developer, I don't see any difficulty in processing more complex events. The only deficiency in the DeadEnds approach vis a vis what we are talking about here, is the ability to link events. That's got me thinking. But since this has never come up before in any discussion of events I've had for the past twenty years, it deserves a little thinking. I don't see any real need to connect a divorce to the marriage it ends, but I can see why one might think it's a good idea. If an event has a start and a beginning, instead of two linked events, I don't see any real problem for a single event with a from ... to kind of date.
ttwetmore 2011-11-27T00:06:40-08:00
"My format solves this through inheritance. A generic event represents the overall quake and derivations of it can specify a more precise location. Other information such as the actual date are inherited from the generic event."

This sounds useful for a historical discipline driven by the need to analyze complex trends, large events, and so forth. One wonders how useful this is for genealogical applications, where the most complex events are usually those described on certificates.

"I use the same mechanism for census events. A generic one represents the complete census on a particular night in the UK while derivations specify a particular household etc."

In my approach each "household" is an event, and anything higher up in the evidentiary chain is a source. In the genealogical sense I don't see any value in an event which is "all census data enumerated on such and such a day," but I do see "all census data collected in such and such a parish" as a possible source, that would be a sub-source of, say, the 1871 Isle of Man census, though, for genealogical purposes, I simply use the entire 1871 Isle of Man census as a single source, with each household I extract from the census as a new event. Of course there is a lot of differences in how people approach describing the source of census data. I guess one could consider the entire 1871 Isle of Man census as one very large event, from a historical perspective, but I prefer thinking of it as a source of evidence, not as an event.

At the genealogical level, what is important about an event is that it name persons, give some important attribute/s of the persons (e.g., a name, a vital date), and if there are more than one person mentioned in the event's evidence, then any information possible about either the roles the persons play with respective to the event, or the relationships the persons have with respect to each other.

But this does lead to some interesting questions. Say you had an ancestor who was living in San Francisco at the time of the great earthquake. How would you like that information to be handled by your genealogical data model? Well, first you do need the evidence that your ancestor was living there at the right date, so that will be handled by some normal "event" like a city directory entry, or a census, or a letter. But then you want to say, by the way, this was the date and the place of the San Francisco earth quake. This has nothing to do with your ancestor, really, he was just living there then. Some genealogical programs come with historical databases that can just tie these facts to your ancestors automatically after examining their details.

I guess the question is whether a genealogical data model needs to include large historical events as a new data class, and, if not, how such information as "my grandfather Charles fought in World War II" should be encoded in the model.
ACProctor 2011-11-27T02:31:34-08:00
I've strongly distinguished genealogy from family-history in my model (which I really must finish off and offer up for people to comment on). Hence, the historical applicability was part of the design to accommodate a more general class of data.

Narrative plays a large part in my model and the narrative content can embed arbitrary references to top-level entities like Person, Place, or Event.

I wanted to get a good balance between a strong, flexible, and normalised approach to those top-level entities while allowing for completely ad hoc and free-form connections that you may want to record.

My big hold-up at the moment is whether Events group Persons, or Persons link to Events. It sounds trivial on the face of it and I can see arguments that work well in both directions. That usually means some middle-ground is where the best answer is :-)

(thanks for all this feedback folks. I'm getting more constructive comments here than I ever got on sgc)
ttwetmore 2011-11-27T02:40:53-08:00
"My big hold-up at the moment is whether Events group Persons, or Persons link to Events. It sounds trivial on the face of it and I can see arguments that work well in both directions. That usually means some middle-ground is where the best answer is :-)"

This is very interesting because I am also in a quandary about this. In the DeadEnds model I have the events refer to the persons via role references, and don't have the persons refer to their events. Of course this is at the data model level and therefore possibly the database level. Once the subsets of persons and events that the user is currently interested in are loaded into the computer's memory, most implementations would just add the redundant link for efficiency in processing. But this begs the question of what happens when a user want to load up a bunch of persons with their events, based on knowing the persons only. If the database were a relational one, there would be an event-person table that could be queried. That is the relational database table has "normalized" the problem away. But in a network database, where nothing is normalized, there is a problem. Even though adding a little redundancy I think it's good to have the persons refer to their events also. They can do this simply by just storing the events' ids in a list.

Note that in the GEDCOM model the pointers go both ways so that persons can get to their "events" directly (which in GEDCOM simply means their families) and vice versa. All other, non-family, events are simply bound into the bodes of the persons under the vastly simplifying assumption that all events have one roles only.
ACProctor 2011-11-27T03:18:46-08:00
I don't like the GEDCOM way of doing it since it is confusing a storage model with a run-time model. As you say yourself, when data is loaded-up then extra links can be added for efficiency. In fact, I believe a source-format (which includes usage for backups, data exchange, etc), a run-time data model, and a database model are all different and have different requirements.

I'm currently focused on a generalised source-format which I believe should be normalised (i.e. minimal redundancy and duplication.

A run-time data model would be a natural successor project but the requirements of it would be different, e.g. efficiency of lookups or correlation.

As for indexed database storage, I believe that's a decision for the designer of any commercial software product. Whether it uses a relational database, object-orientated one, OLAP one, key-value pairs one, or some proprietary one is their choice. [I'm pretty sure I've seen the same sentiments in another of your threads in these pages so that's very reassuring]
ACProctor 2011-11-27T03:25:32-08:00
In the interests of keeping this thread focused (I apologise for diverting it already), is there a separate one on the relationship between Persons and Events?
ttwetmore 2011-11-27T10:09:12-08:00
"I believe a source-format (which includes usage for backups, data exchange, etc), a run-time data model, and a database model are all different and have different requirements."

I agree. First there is the model. The source format (which I call the external format) is a text-based archival format that must be deterministic so it can be parsed, and I think it is desireable that it also be human readable. Then the database format is wholly up to the development team. And of course leaving it up to the development team often introduces problems (artificial limitations, non-standard extensions, misinterpretations of the model), but with published test data and requirements to pass a reflexive import to export test that leave data unchanged in order to be certified compliant, these issues are controllable.

"A run-time data model would be a natural successor project but the requirements of it would be different, e.g. efficiency of lookups or correlation."

Shouldn't the run-time data model be left to the development organizations? I may not understand you point here. By the run-time data model I think of the actual Java or C++ (in my case Objective-C) classes used to implement the software.

"As for indexed database storage, I believe that's a decision for the designer of any commercial software product. Whether it uses a relational database, object-orientated one, OLAP one, key-value pairs one, or some proprietary one is their choice. [I'm pretty sure I've seen the same sentiments in another of your threads in these pages so that's very reassuring]"

I agree. I like to play devil's advocate against relational implementations, since I think it is too ingrained in our default way of thinking, and I think it contributes to many of the artificial limitations found in commercial software, but frankly those limitations are not inherent in the relational model, but in the implementations. I prefer a network database approach because it feels no nice and object-oriented to me, but I am in the minority here, and am not trying to change minds, just trying to suggest the value of thinking before doing.
ACProctor 2011-11-27T10:19:04-08:00
By run-time data model, I wasn't so much thinking of the internals of a particular product as the public object/method interface that would be offered up for run-time interoperability.

This is something I feel would be a huge step forwards but fear it may be still far away - the possibility of products interoperating in real-time.

It would allow a specialising of the market (ideally) so that database products are separate from analysis products and separate from reporting products. It would also be necessary for any type of cloud computing where online trees could be published as opposed to simple pedigrees on a Web page (the latter doesn't support any type of analysis or correlation with, say, your own tree)
ttwetmore 2011-11-27T12:15:26-08:00
"By run-time data model, I wasn't so much thinking of the internals of a particular product as the public object/method interface that would be offered up for run-time interoperability."

I understand. I would call that a service API. New Family Search has one that they publicize, Ancestry.com has one that they keep secret. It may be a pipe dream to expect more than one service provider to agree on the same API, but we could hope for it.
eleanordew 2011-11-22T13:10:10-08:00
Slavery would certainly qualify as an event over a time-period, in fact, it could be seen as a state of being.
ACProctor 2011-11-26T06:56:38-08:00
I only came upon these pages when I saw a thread about "multiple places per event". This also connects with "events over a time period" here since emigration/immigration was cited as a possible example - the place of origin and the place of destination being at opposite ends of a protracted event. Someone pointed out in the other thread that this case could also be handled as two independent events but a less contentious case might be WWI which still has both a start date and an end date but which would mostly be kept together.

I happened to be working independently on designing a "source format" for family history which I intended to publish on the Web soon but I'd like to share my thoughts and ideas.

In that source format, I represent the two end-points as independent events (e.g. emigration versus immigration, or outbreak of WWI versus Armistice Day) but link one to the other. That effectively defines a "protracted event" that has a duration. It also allows the two end-points to have independent locations as in the case of a long journey. Note that it still allows the individual end-points to be referenced separately if necessary. For instance, if something in a person's family history was relevant specifically to the outbreak of WWI or the ceasing of hostilities then they can still be referenced explicitly.

I've generalised this approach so that multiple mid-point events can be associated with the overall group, thus defining a "structured event".

Do you think there might be some useful ideas in this approach?
NeilJohnParker 2011-11-26T07:30:59-08:00
I believe there is a similiar situation with other events, e.g. marriage and divorce. The divorce needs a separate event with its own date and possible place if its relevant but most importantly it needs to be linked to a specific marriage although usually it can be inferred. Also both the marriage and divorce event may need to contain an attribute of the authority that granted the marriage or divorce, i.e Smith Fall's Presbeterian Church or Family Court, Witchta, Kansas respectively.
ACProctor 2011-11-26T07:46:11-08:00
The idea of extending Event-groups that far is a big step. It's true that divorce could be treated as another end-point to the overall marriage but it feels more hazy, or maybe I'm just more scared of going that far ;-) Courtship usually pre-dates the marriage but is often left out in our the family history of our culture. The signing of different forms of marriage agreement could be before or after the marriage. My own marriage was in two parts (civil and religious) which happened over 5000 miles apart. What about separations, both formal and informal, and reunions for that matter? That approach could be taken to include someone's whole life from birth to death.

I think I didn't go that route because I'd defined an event as simply something connecting one-or-more persons with a place at a given time. I'm not trying to deconstruct a whole marriage or a whole life and represent it all as a 'structured event'.

I'd be interested to see how other feel about that.
NeilJohnParker 2011-11-26T09:04:50-08:00
Unfortunately a divorce is associated with one specific marriage, although which one can be implicityly inferred (if and only if you have the dates of each marriage and each divorce), would it be better to explicitly state the link between the two events (if one knows what it is), especially when the dates are unknown or uncertain.
ttwetmore 2011-11-26T09:17:46-08:00
In the DeadEnds data model the event is allowed to have any number of dates, and those individual dates can be date ranges themselves, so theoretically you can have an event for something that occurred just on a series of weekends and you could handle that. Not, of course, that such a feature would ever be widely (or ever?) used. The point is, that it is so easy to allow the flexibility that one just does. Likewise, the DeadEnds event can occur over any number of places, though there is no notion of the starting place and the finishing place. DeadEnds does not have an obvious way of linking events to one another, however. I've never imagined the need for such a thing, though the marriage followed by divorce examples does cause one to think about it.
ACProctor 2011-11-26T10:18:15-08:00
That sort of flexibility sounds like it would make subsequent processing more difficult. Having a simpler 'event' concept that is well-defined and atomic allows you build concepts such as a 'protracted event' or a 'structured event' without losing the simplicity of using it as a marker in time. I haven't come across a good example yet that required more than one location for an event that doesn't have a duration.
AdrianB38 2011-03-28T08:31:24-07:00
Data-Fam02 - Cohabitants
"BetterGEDCOM must support the recording of information about cohabitants, with or without, common children. Cohabitants should be treated in the same way as married couples, and there should be events for the establishment and dissolution of "cohabitants". Some couples may start out as cohabitants and then marry."

Why: "The percentage of couples that are cohabitants is increasing in the western world, in some countries it is as high as 25-30%. BetterGEDCOM should not discriminate people in such relations."
GeneJ 2011-04-05T09:21:06-07:00
@Adrian,

Cool.

PS. "it's up to the application coders to come up with the desired reports."

BetterGEDCOM needs to distinguish between new or custom tags the create essential "genealogical" associations (whether traditional, scientific or other) and all other tags, right?
AdrianB38 2011-04-05T13:14:07-07:00
"BetterGEDCOM needs to distinguish..."

Yes. I think. I also think that USER defined tags will not be able to create any "genealogical" associations for the simple reason that the software will not understand them. UNLESS they can inherit properties of "higher" tags - which I think is a requirement.

Custom tags defined by an application's developers will be able to create "genealogical" associations in that application but will not be understood outside that app.
GeneJ 2011-04-05T14:56:15-07:00
Cool.

P.S. My earlier comment about "reports" was just an attempt at clarifying that "genealogical association" concept.
GeneJ 2011-04-05T17:04:30-07:00
Wait ...

You wrote, "'genealogical' associations .. will not be understood outside that app."

Where do I enter a requirement about "genealogical associations?"

Or otherwise, what am I missing.
Don't we break content if generic application data for BMDB and information the role of "children" (in its various forms) is not able to be understood from program to program.
gthorud 2011-04-06T05:48:59-07:00
I have to cover several issues since a lot has happened since my last post.

First, one issue from my first posting above. I think there is a need for a DEFAULT “unknown” status. I could call it marriage status, but it could as well be defined as cohabitation status – i.e. you know they had children, but don’t know if they ever lived together or were married. The point is that when there is no marriage event (of any type) and no cohabitation event – no nothing – the status should not be assumed to be “not married” or “not cohabitating”. This is just something that needs to be stated somewhere, and there is no need for data reflecting this. But there could in addition to this be a need to explicitly record that e.g. “It is not known if X and Y were married.” Is there such a need?

Although we have probably sorted out the official/legal status issue, it really should not matter. It should not be a requirement that all events in BG must have a legal status, or be recorded in a document.

I agree that the term “family” could be a very wide concept as Adrian has described. (Do we also include mafia families?). According to Webster’s, one definition of a family is a household. One definition also include servants and their family.

Marriage and “moving in as cohabitants” can be seen as two events (of many) that “initiates” (or changes the status of) a family, but what they really define is the initiation of a relation between 2 persons. Since there are several types of marriages etc, I think it would be useful with a super event type “initiating a family type of relation between two people”. Similarly there will be a “super divorce”. Since the term event has an established meaning, it might be better to use the term super rather than sub – and there is also “class”? .) I am in no position to choose the term to be used in English for the “Domestic union” or super family, but the definition should not be mixed with the definition of a super term for the relation between the two persons –there is a need for two super terms.

An event such as marriage would have the effect of dissolving the cohabitant relation, but there is no need to state that anywhere in the data or in a report. Adoption is, at least in my head, a family initiating event, but it says nothing about the relation between the parents (or does it? Varies from country to country?). There are cases where persons of different religions have two ceremonies performed, one for each religion, but that should not be a problem. (I don’t see a need for 3 levels in the super/sub hierarchy, i.e. subtypes of marriage
There are several types of “baptism” that would be sub types of a “super baptism” event type, but are there other types of events where we need super/sub?? The purpose is to define a common way that programs should treat these sub-types, and would also be useful for the understanding of a foreign unknown event type, but the implication for programs is most important. (A possibility is perhaps “life story”, but that would only? control the placement of info in reports. See below.)

We may need a special event type for cohabitants living in a college dormitory, but I will leave to others to decide if that would be useful – i.e. would it ever be used. It might be necessary to include some qualifying text in the event sentence for the family initiating/dissolving “cohabitation” event in the US and some other countries – or use a different term?

I think it is important to create an example of how an extended family would appear in reports, if you don’t, it will not be implemented. Vendors must understand what we are talking about. If we have invented a “life story” type of report, it will have to differ from the normal biological reports in other ways than just extended families – extended families are not important enough on it’s own to be a reason to have separate type of report – but I would rather see this info in a “normal” report. Some narrative biological reports have, by default, paragraphs for persons only (with lists of children), others have paragraphs for the “parents” followed by person paragraphs and then children. Would the “aunt” fit in a paragraph after the children? It may not fit in the paragraphs for the heads of family, but if there are events for the family group saying that they lived in X and Y, it could fit there. A variant, for a biography for the aunt would be to mention that she lived with the family of X and Y, rather than listing all members of the family. (Am I correct in assuming that in the US there are style guides for reports, where this type of info does not fit?) I don’t think you would duplicate information about biological families and social families in a report by listing all members of both families. I don’t want to be required to print different report types, I want it all in one – otherwise the social/extended family info will not be used. The aunt might fit in both a person and family life story paragraph, as a life story class of events (does any program have such a class?)– but I thought we were talking about representing families as Groups , how does that fit with using events for the same purpose (other than birth and marriage/cohab)? Further work on this is needed!!
GeneJ 2011-04-06T10:50:52-07:00
Hi Geir:

You wrote, "If we have invented a “life story” type of report"

We need terms that distinguish between tags (or tags and roles) that have key genealogical significance and those that do not.

I borrowed the term "life story" from the BCG Genealogical Standards Manual. In the simplest terms, it's the "those that do not" group--tags that don't have that key genealogical significance.

Those applications that support Register or Quarterly styled reports, segregate key genealogical data into a paragraph called the "genealogical summary paragraph." The "life story" paragraph or paragraphs follow. These biographies close with a "list of children."

Here's an example of a generically named by stylized narrative:

http://www.genbox.com/reports/webs/Descnarr103.htm

It's really the "key genealogical data" tags/roles that need to be identifiable.

If programs don't understand each others' "key genealogical tag" then content such as descendant charts and trees, as well as stylized narratives, will be broken.
GeneJ 2011-04-06T10:54:38-07:00
P.S.
See BCG Genealogical Standards Manual, p. 66 for definitions and descriptions of "Genealogical Summary," "Life Story" and "Child List." Numerous examples follow.
AdrianB38 2011-04-06T14:07:31-07:00
Am I missing something here? I just feel we seem to be making heavy weather of things with our calls for examples in reports, etc, and defining these tags that have...

Gene - re "distinguish between tags (or tags and roles) that have key genealogical significance". Yes - but can you define "key genealogical significance" first? If you do, then I suspect it will be obvious which tags (whatever) contribute...

I wouldn't try to generate any type / sub-type arrangements until you've got the detail - then it should be easier to do that.

And Geir, re "an example of how an extended family would appear in reports" - I really do not see it as BG's role to define the format of reports - that's totally the role of the application. And if the BCG GSM has a defined format that needs to be altered because we are altering the definition of family, then I suspect the BCG should do it.

As I said - am I missing something about the data and how it is to be stored?
GeneJ 2011-04-06T14:41:09-07:00
Hi Adrian:

Yes, I can!!

From a data standpoint, "how" you accomplish some of this depends on that family wrapper.

Independent of that wrapper, however, for any one person:

PARENTS (or "parents," as the case may be)[1]
BIRTH (or primary birth/best evidence of birth, as the case may be)[2]
DEATH (or primary death/best evidence of death, as the case may be)[3]
MARRIAGE (or "Union"; each, and each spouse has their own set of "key genealogical tags"; marriage/union dissolution)[4]
CHILDREN (or "children," as the case may be)[5]

[1] Parents might be adoptive, etc. BetterGEDCOM has discussions about this.
[2] Best evidence of birth, for example, might be baptism.
[3] Best evidence of death, for example, might be a burial record, and obituary publication date, or will date and probate date.
[4] Where marriage might be union, etc. Sometimes the best evidence of marriage or union is birth of a child. Best evidence of a marriage dissolution is remarriage of one spouse. I've observed a dissolution remarked parenthetically withing IN a marriage tag, such as XXX married XXX... (divorce) ....
[5] Where a child might be a natural child, and ?optionally, adopted children, the children of a spouses (and I presume the extended definitions we have been providing.) There are other discussions in BetterGEDCOM for these.
gthorud 2011-04-06T15:30:12-07:00
I have entered a new requirement into the catalog - Event Classes, Data-Event05. The first issue is if the term "class" is ok. My thinking is that events already has a defined meaning, so using sub-type does not fit with existing events. Also, Class is a computer term that fits exactly with what has been discussed. If there are no comments, I will use that term - and create a discussion topic.
AdrianB38 2011-04-07T08:39:32-07:00
Gene
My first thought is that all those events can be - indeed, should be - recorded as multi-person events and none should be recorded as "Family Events".

PARENTS - birth parents are identified by the birth event of the person concerned. The 2 (or 1 if the other is unknown) parents will be recorded as persons linked to the birth event - the 2 parents have roles of birth-mother and birth-father and the individual has the role of child. (Birth-child??)

To navigate from the child to their parents, go to the child's birth event and find the 2 people with the correct roles.

If you want to navigate to adoptive parents, go to the child's adoption event and find the 2 people with the correct roles of adoption-parent.

Possible issue 1 - how would we present a child adopted by their step-father (say)? The adoption event would obviously contain the step-father with a role of adoption-parent. Would the mother appear for a 2nd time? And if so, with what role? I suspect my answer would be - what does it say on the document?

Possible issue 2 - if you want to trace family history via adoptive parents, you need to create an adoption event, you can't just put the child into a family with its adoptive parents. Seems fair enough to me - I'm just asking people to distinguish between an adoption and a case of just living with someone for a while.

BIRTH - see above.

DEATH - an event with, I presume, just one participant, having a role of "deceased".

MARRIAGE - in my view of the world this (or any of its variants) is NOT to be recorded against the family. Instead, there is a multi-person event of "Marriage ceremony / Civil Partnership ceremony / Cohabitation start / whatever ", which describes the EVENT. The 2 spouses would be persons linked to the event, with a role each of "spouse" (alongside, say, 2 others linked with a role of "witness").

Dissolution should be obvious - another multi-person event linked to the 2 spouses, each with a role of spouse.

Children - someone's children are identified by looking for multi-person birth-events (or adoption-events) where that someone is a participant in the event with a role of birth-father, birth-mother, adoption parent, whatever. Then the children are the persons in the event with a role of birth-child / whatever.

Possible issue: Because you can't navigate biology for this sort of thing from the family, you must have a birth event for this person - you can't just add them into a family. This is not unreasonable - everyone is born, I suggest. If someone objects that they can't tell who the parents were, fine - leave those roles unfilled in the event. Clearly they won't appear as anyone's descendants, which is fine.

Children of a spouse (i.e. step-children) - unless these are adopted, they should appear only as descendants of their birth parents. They can appear in a family-history report about the social nature of the parents' lives, but they should not be appearing in a blood-line descent report. Time to get rigorous about the purpose of these reports...

So, I think for all these, the family group has NO influence on the bloodline / adoptive-line reports (up or down). But it DOES influence the social history reports of individuals.
gthorud 2011-06-19T17:32:11-07:00
Just a link to a related GRAMPS discussion

http://gramps-project.org/wiki/index.php?title=GEPS_001:_Relationship_type_event_link

(which has previously been pointed to by the Shortcommings of Gedcom page)
GeneJ 2011-04-04T13:12:20-07:00
If you aren't there ... you are verrrrry close.
GeneJ 2011-04-04T13:15:02-07:00
P.S. Thinking out loud again.

If a "union" exists that creates "heads of families," then from the standpoint of standardized biographies or narratives, the children of either/both would be relevant, right?
AdrianB38 2011-04-04T13:42:33-07:00
"the children of either/both would be relevant, right?" Yes - but I think that (a) it's down to the application writers and (b) there are a couple of ways they could go.

In my view - which may not be shared by everyone and may not match the final BG model - a family a.k.a. domestic union is a social construction. The (believed) bloodlines are separately derived from the birth events of the children because these document the (believed) parents of the child.

A standard biography of X might include details of the (social) families that X was a "parent" in, or a "child" in, or a "dependent adult" in (guessing at roles here). Any or all of those concepts could be supported by just looking at which family a.k.a. domestic union group the person is in in those roles.

One could - additionally and separately - describe said persons (believed) biological (or indeed adoptive) parents by navigating to the birth (or adoption) event for said person.

And once you'd done that, then you could repeat ad infinitum.

(Or your report could go backwards from the heads of the social family...)
AdrianB38 2011-04-04T13:49:38-07:00
Re my Domestic Union / whatever definition - is this not, on reflection, simply a family??

Or have we got too close to the conventional married ma and pa with that term?

E.g. "A FAMILY is an arrangement whereby two or more people decide to live together on a long-term or permanent basis in an emotionally and/or sexually intimate relationship which may be formally recognised or not. THE FAMILY MAY SUPPORT other ADULT OR CHILD dependants."

Does this cover the full range of possibilities from
co-habiting couples
to same-sex partnerships with adoptive children
to Egyptian Pharaohs with harems
to married ma and pa and the kids - with their aged maiden aunt???? Etc??
ttwetmore 2011-04-04T15:14:17-07:00
Adrian,

I think the answer is yes, yes, yes, yes, yes, and yes.

The "family hating" camp stresses that all kinds of weird relationships can show up in a household, and since genealogy is supposed to be biological, it's a useless concept. That is a very strict and short-sighted definition of genealogy.

The "family loving" camp stresses that genealogy can cover many aspects of family history, and how people lived together matters.

Personally I'm in the "family loving" camp as long as we can extend it to cover all the cases you're worrying about.

Tom W.
gthorud 2011-04-04T18:07:15-07:00
It seems like Cohabitants is a publicly recogniced status in the UK.

See
http://www.statistics.gov.uk/hub/population/families/marriages--cohabitations--civil-partnerships-and-divorces/index.html

http://www.statistics.gov.uk/StatBase/Product.asp?vlnk=14491

Or have I misunderstood something?

Civil partnership seems to be a gay thing.

From one of the reports (for England and Wales):
The number of cohabiting couples is projected to rise from 2.3 million in 2008 to 3.8 million in 2033 (Table 2). The proportion of those cohabiting who have never previously married is projected to rise from 74 per cent to 87 per cent.
Ref: http://www.statistics.gov.uk/pdfdir/marr0610.pdf

The percentage of cohabitants today seems to be above 20% of the number of married people, about 1 of 6 couples. I am surprised that this has left no traces in laws or public regulations.
gthorud 2011-04-04T18:48:23-07:00
I think it would be useful to create a concrete example of how an "extended family" with the old aunt could be recorded in a data structure, for example using the group concept, and showing how this could be shown as an extension to how families (parents with childrens) are presented in reports now. Any supporting events could also be shown in the data.

Maybe that should go in the discussion of Data-Family01.
gthorud 2011-04-04T19:25:41-07:00
Continuing from my last posting.

Considdering that the relation between the father, mother and children is recorded by eg. marriage and birth events, in what structure do we place the old aunt without repeating the relation information carried by the events? How would a program know what to put in a report (based on which data structures, what triggers the output of the aunt), and where in the report, and how would the whole extended family look like?
GeneJ 2011-04-05T08:00:21-07:00
Genealogically speaking, I see a difference between "heads of families" and the ways you might define the children (ala, the "genealogy" linking) and individuals or "family associations"
(like Aunt Nellie)--those who influence the lives of the family. The latter has a life story feel to it.
In that context, Auntie is a "life story" assertion (with a separately defined "genealogical" relationship). Ala, an Aunt Nellie who lived with the family for 15 years may well be associated and assigned roles relative to many events.

I can see not just Aunt Nellie, but perhaps close family friends similarly associated with assigned roles.
GeneJ 2011-04-05T08:24:54-07:00
P.S. I favor reserving "Cohabitant" for the "heads of families" concept. I'd prefer to see a different words or means by which otherwise unassociated children are linked to the family unit.

That separate "Aunt Nellie" role has a "kinship theory" feel to it--I'd prefer to see those concepts in life story tags.

Separate from references to her in the other family biographies, wouldn't we want Aunt Nellie to have her own biography and story?


My thoughts only -- possible my perspective falls short of Adrian's concept.
AdrianB38 2011-04-05T08:43:04-07:00
"Cohabitant is a publicly recognised status in the UK" - oh, it's a status alright, in the sense that it exists at a statistical level, but it's not something that can be formally entered into. Most legislation recognises its existence - for instance, paternity leave is not dependent on marriage, but there are all sorts of question that could be raised about when cohabitation starts... But, like I said - I'm a mathematician who likes stuff to be ordered!!
AdrianB38 2011-04-05T09:02:16-07:00
I wouldn't want to get too illustrative about reports because I think different people want different things. There's a "social history" angle to them and there's a "(presumed) biology" angle to them, and old-fashioned genealogists (none here surely!) might deride the social history ones.

The point is that if we can show the concepts cover everything somewhere, then it's up to the application coders to come up with the desired reports.

I think Geir is right - it's Fam01 ("Families independent of biological relations") that should have the illustrations in... I shall try to concoct some over the next couple of days.

I shall try also to illustrate there how I think stuff need not be repeated. I hope...
gthorud 2011-03-28T17:07:01-07:00
I have to consider this in the context of how things are done here. 30-40 or more years ago the only way to describe cohabitants was that they were not married, and being not married was something out of order, so stating that someone was not married was a negative statement. Since then the term cohabitant has been accepted (halve of the children born in Sweden has unmarried parents, halve of all Norwegians have been a cohabitant, 30% of new families in Norway are cohabitants and does not marry, the percentage in the rest of Europe is above 10%), but in some circles “being unmarried” is considered a negative thing, and the term is used to discriminate people. So, although I could use the “status” unmarried/never married about someone before say 1950, I would not use that term about people living today.

Cohabitation has in the last 10 years gained more and more acceptance as a legal “institute” here and is in many (most?) situations considered to be the same as marriage. About 20% of cohabitants sign a contract that regulates what should happen in various situations, if the relation breaks up or one of them die.

So to me the solution is simple, consider cohabitation equivalent to marriage, with events for “moving in together” or establishment of a cohabitant relation and “moving out” or dissolution of the relation – similar to marriage and divorce. You can have dates and refer to sources (e.g. a contract). And, these events are events of real life, formal or not formal does not matter.

There are many types of marriage, e.g. Common law marriage, Gay marriage, and whatever in various religions. You could define these and Cohabitation as user defined events, but the problem is that programs have to treat these events specially, so they should be defined in a standard in the same way as marriage.

TMG has groups of events, one being Marriage. If you create a Cohabitation event type, and assign it to that group, TMG will treat it in the same way as marriage in reports etc.

There is one situation, which I assume is common in many countries, and that is when cohabitants marry. That must be handled without creating a new family – but this may not be a problem in some programs.

(A special problem arises when cohabitation is as common as 30%, you often do not know if people living today are married or are cohabitants, because marriage records are not public. But I guess most programs are able to handle unmarried parents, so there may be no need to define a “married (or cohabitant)” event.)
AdrianB38 2011-04-03T13:29:25-07:00
Geir,
I do prefer the tactic of defining something by what it is, rather than what it's not, i.e. "cohabiting" rather than "unmarried". We might want to review the English words - in the UK we'd simply say "living together" and the actual word "cohabiting" would seem a bit of a mouthful. That's a detail, however.

But I also think we want to firm up on what the "cohabiting / whatever" term means. If there's a legal basis to the partnership then I'd exclude that from the "cohabiting / whatever" term as the suggestion there is, to me at least, that there is no legal basis. So we've probably got several varieties of thing to consider.

In order to cope with unmarried / uncontracted / cohabiting / whatever couples changing their status, it would probably make sense to have events for the creation and dissolution of such relationships.

Except I'm still wholly dubious about a couple where there is no formal legal basis for the partnership being recorded with anything other than a common "residence" event if there are no children to prompt a family's creation. If we start creating a family to record 2 co-habiting, unmarried, un-contracted adults living together, then we just recreate all the anomalies of the GEDCOM family with no justification outside the recording of an event that could be recorded more simply elsewhere (e.g. "residence" with 2 people).
gthorud 2011-04-03T15:34:07-07:00
Adrian,

It is difficult for me to discuss the English term, so I'll stay out of that discussion. (But I seem to smell a cultural difference since you are not using one word.) Rather than switching to the Norwegian term, I will continue to use cohabitant in the same way we have used PFACT.

Also, I am not too concerned with the exact term since that will have to be handled in translation. The important thing for me is that programs handle this similar to marriage, as I have described above.

Even if there is no official ceremony, and in many cases no contract, cohabitation? is a legal status here. For example, authorities dealing with social security and other benefits keep track of this, because single and cohabitants are entitled to different benefits.

I think you will run into trouble if you try to come up with a common term that has the same definition in all countries. And the legal status is different and is likely to change over time.

I don't see why cohabitants, with or without children, can not be considered a family. I don't see any reason to treat these differently from how BG will record families.

I just think we have to accept that there are cultural differences, and since I see no big problems in implementing what I want, I don't see why not.

I you want to record a residence event, that is ok for me.
AdrianB38 2011-04-04T02:13:45-07:00
"cohabitation? is a legal status here" - OK, that's an important point, fully justifying its appearance as an event in a file describing events in Norway etc. In the UK it simply doesn't have that legal status, though there have been high profile court actions when cohabiting film stars have split and one has claimed the equivalent of alimony - giving rise to the term "palimony".

"I think you will run into trouble if you try to come up with a common term that has the same definition in all countries" - yes, this is becoming clear. And I wholly subscribe to your view that "programs handle this similar to marriage". In Object Oriented terms, there needs to be some over-arching concept of "domestic partnership" (for want of a better term) that is something more than co-residence. This "domestic partnership" should trigger all (most of?) the special reporting and handling that we see with marriage. Marriage would then inherit the special reporting and possibly add some of its own. Civil partnerships in a formal legal sense would also inherit the special reporting and possibly add some of its own. Informal partnerships in whatever senses would also inherit the special reporting and possibly add some of their own. This seems analogous to the TMG handling of marriage group events that you mention above.

And somehow we have to allow the creation of these variations in each country. If we are too specific in the BG standard, we will exclude some of the variations. If we are too loose, we will be accused of being just as ambiguous as GEDCOM. So we probably have to create some specific things plus the ability to add inheriting user-defined variations.
GeneJ 2011-04-04T05:24:01-07:00
Are we able to identify Wikipedia, FS Wiki or other articles that provide an understanding to what we are describing by "Cohabitants."

I located two entries in Wikipedia.

For example, there is a Wikipedia entry for "Common-law marriage," purporting, "Cohabitation alone does not create a common-law marriage; the couple must hold themselves out to the world as spouse."
http://en.wikipedia.org/wiki/Common-law_marriage

There is also a Wikipedia entry for "Cohabitation," opening with, "Cohabitation is an arrangement whereby two people decide to live together on a longterm or permanent basis in an emotionally and/or sexually intimate relationship. The term is most frequently applied to couples who are not married."

The section, "Cohabitation by region," is pretty interesting.
AdrianB38 2011-04-04T07:58:47-07:00
The Wikipedia entry for "Common-law marriage" shows some of the complexities - I knew England and Scotland differed in their treatment of what can be loosely termed "Common-law marriage", but I hadn't realised one could identify 4 varieties in Scotland (assuming the article to be accurate).

(NB for anyone outside the UK - England and Scotland actually have a separate legal system and a law created in one is not necessarily applicable in the other. Indeed, I'm not sure if it ever can be...)

As I said, somehow we need to define an over-arching concept, with some more specific variations but space for further user-defined ones.
AdrianB38 2011-04-04T08:24:06-07:00
The thought of defining further ones leads me to pick up on previous words where I said effectively that I was dubious about creating a family record where there were no children and no legal status to the partnership.

Geir responded "I don't see why cohabitants, with or without children, can not be considered a family".

Thinking more about this, I am moving towards Geir's view. If the family represents a social structure (and I have previously suggested that a maiden aunt living with a family could be recorded in the family unit) then there is no reason why the social structure shouldn't consist of just two adults.

My desire to equate family with a social structure comes from a distaste for GEDCOM requiring a family when there is a (presumed) biological relationship between 2 people but no social unit. Two adults in a social relationship surely don't create the same anomalies, on reflection.

Somewhere in here we also need to distinguish between the social unit and the multi-person event known as a marriage. I think my mind can hear Louis telling me that the marriage _ceremony_ event is what creates the change of state between no-family and family. It is therefore not, now I think about it the same thing at all. But of course, we habitually (in English at least) refer to people being "in a marriage" when we actually mean "in a social unit that was created by a marriage-ceremony".

Thus we have
- the marriage-FAMILY describing a social group and inheriting the characteristics of a "domestic partnership" group
- the marriage-ceremony-EVENT probably inheriting the characteristics of a "domestic partnership" creation event and probably triggering the creation of a group though this is up to the application;
- the civil-marriage-ceremony-EVENT probably inheriting the characteristics of a marriage-ceremony-event;
- the church-marriage-ceremony-EVENT probably inheriting the characteristics of a marriage-ceremony-event;
- the civil-partnership-EVENT probably inheriting the characteristics of a "domestic partnership" creation event and probably triggering the creation of a group though this is up to the application;

And so on...

And perhaps unfortunately, this now requires 3 levels of event - bother. Not sure if that's an issue or not.
GeneJ 2011-04-04T08:30:40-07:00
Thinking out loud

I assume we are talking about identifying the heads of families in the context of genealogical biographies (regardless of whether either or both have children and not limited to traditional genealogy), then we need some way to identify "cohabitants" as something other than those living in my college dormitory.

While those roommates might be a part of my life story or even in group be associated with some role, they would not have the genealogical significance of those in "heads of families" roles.
GeneJ 2011-04-04T08:59:20-07:00
@Adrian,

"...dubious about creating a family record where there were no children and no legal status to the partnership."

Albeit a quite special one, Marriage seems a "class" of union, with roles "husband/wife" or "groom/bride" or "spouse/spouse," no doubt there are more roles.

I don't really see what children has to do with that. Either spouse or partner might bring children to a marriage, some children are the product of a marriage, some children are adopted by one, the other or both spouses.
GeneJ 2011-04-04T09:25:01-07:00
From the the two Wiki articles, how about the concept of "other union - by habit and repute."
AdrianB38 2011-04-04T12:36:20-07:00
I think your Wikipedia entry for "Cohabitation" is getting there for the over-arching union.

Maybe the concept I'm grasping towards is "DOMESTIC UNION(???) is an arrangement whereby two OR MORE people decide to live together on a long-term or permanent basis in an emotionally and/or sexually intimate relationship WHICH MAY BE FORMALLY RECOGNISED OR NOT, AND MAY INCLUDE OTHER DEPENDANTS."

(Change in CAPS).

This could include children and others such as elderly parents, maiden aunts, etc, in the household.
AdrianB38 2011-04-04T12:38:04-07:00
NB - I do NOT include the dependants in the "emotionally and/or sexually intimate relationship" - that simply provides the basis for the support mechanism for the dependants. Feel free to help with that phrasing!!!!
AdrianB38 2011-03-28T08:42:34-07:00
Can we establish how cohabiting couples would / could be recorded differently from a normal family in GEDCOM that simply omits the marriage event?

Is it simply that we need a "status" of unmarried? i.e. we need to confirm that the marriage has been omitted for a reason and not simply because it's not known?

Having said that.... While a simple status may suffice where the couple live together and _never_ marry, if the couple live together and _then_ marry after some years, the situation is more complex as I'm not sure how to describe the pre-marriage era.

The suggestion for events describing the establishment and dissolution of cohabitation, seems one way out of this but I'm not a fan of GEDCOM / BG events that don't match events in real-life - and the whole point of cohabitation is that it happens without a formal event as such.

A split of such couple does seem to match a real event. Two people starting to live together may not have any detectable start...

Maybe a dated-status is needed?
testuser42 2011-03-31T14:58:18-07:00
Data08 - Importing Data (Proposal)
I think this has been mentioned in the past -- but it's not covered yet. Or is it?

BetterGEDCOM should be able to import files in GEDCOM and BetterGEDCOM format.

My suggestion is that the imported files are saved in the BG Container without any changes. In the "master" BG file, links are added to the imported files, or specific parts of these. That way, it's clear where the imported data comes from. The Source of a imported PFACT might then be given as the imported (Better)Gedcom, and the Source of the Source will have been in that file already.
gthorud 2011-03-31T19:50:14-07:00
Just a few initial thoughts.

"BetterGEDCOM files should be able to contain a BetterGEDCOM file" - sounds like some sort of bootstraping :-)

Two viwes have been presented in the Multimedia discussions, it is a terminolgy issue - is the BG file the container with all it's content or is it the (or those, considdering your proposal) file(s) in the container that contain the genealogical data. My view has been that a BG file is the one (or more) inside the container, since we should not require a container for long term storage - and now also to avoid the bootstraping (possibly also containers inside containers) - so I will use that definition, and call the outher envelope the container.

The main issue here seems to be the referencing mechanism, possibly involving UUIDs when both files are BG. Maybe there is something relevant in the previous UUID discussion?

These references could in priciple also be to other G/BG files outside the container.

(An intreresting case might be if one BG file references another, and the other a third, or back to the first ... maybe not?)

And, there are other possible uses for such references, not only sourcing ... I have to think more about this ... will be back.

The requirement should perhaps be placed in the multimedia group - just to keep things out of Data, but we will see.
gthorud 2011-04-03T16:26:15-07:00
testuser,

If what you want is simply a way to enclose a Gedcom or BG file, and reference the whole file, it is not any different from referencing a photo or other multimedia, assuming that the container mechanism will identify the "primary" BG file that references the "attached" one. The only thing that must be done is to mention these field types as supporting files where we mention the alternative formats in requirement Multimedia01.

Referencing specific parts of a Gedcom/BG file is a new thing. If there is a requirement to reference a part of other file types, a general mechanism could be implemented, probably based on an identifier. The type of identifier must be specified. A new requirement.



But, beyond you requirement, an interesting case is, what should happen if you load the supporting G/BG file into the receiving program. Should it be possible to establish a direct link between entities in "data sets" imported from different files?

I once worked on a project that covered all persons on all farms in a parish, a parish where all relevant parish/church records have been transcribed. I then wished I could have had a supporting Gedcom file loaded into my program holding eg. all christenings in a parish record (the whole book). I would then be able to keep track of the individual records where I had been able to identify the person in my primary file, so I knew which ones that were outstanding. It would also prevent me from assigning the same christening record to two persons, and when in the supporting "file" I would have a link in the other direction so I could tell which person I had assigned the record to.

I could also imagine that such cross referencing could be used between my private data and a separate set of data(exported/imported to/from a BG file) that could contain the data worked on in a collaborative project, unique identifiers would allow me to re-establish the cross referencing links when I receive a new version of the "collaborative file".
ttwetmore 2011-04-04T06:19:59-07:00
To say that BetterGEDCOM must be able to import GEDCOM doesn't make sense. To say that an application that is Better GEDCOM compliant should be able to import all the data from GEDCOM files into its databases does make sense. When that data is later exported by the program in Better GEDCOM format, the data would appear in Better GEDCOM format.

The issue that I am trying to understand here is why a Better GEDCOM compliant program would want to "keep around" a GEDCOM file after it has imported that data it contains into its own internal database. Since Better GEDCOM will be a proper superset of GEDCOM, there is no "data" reason to keep it, because it can't contain any information that hasn't already been fully imported. It sounds like the argument here is that a user might want to keep is as a kind of "evidence," just like every other kind of evidence. I guess some application designer might go along with that idea and allow his users to keep the original GEDCOM files as just another kind of URL referenced entity like an image file that can be put in transport containers. Personally I think using the existence of a GEDCOM file as a source is dangerous, that the sources of the data should be embedded in the GEDCOM file. But regardless this shouldn't be a requirement of Better GEDCOM itself. This should be treated strictly as an application issue. The only requirement on Better GEDCOM should be the multimedia one, that is, that BG should allow users to include references to external URLs (including local files) in their databases, with the ability to place them in BG containers for transport. If the application designer decides that the GEDCOM files that a user uses to import data is one of these external files, that's fine.

Another concept appears in this thread, the notion of whether an object in one database or transport container can refer to an object that exists another database or transport container. In general, GEDCOM files must be "closed," meaning that every object referred to by an object in the file must also be in the file. Because GEDCOM id's are, in general, subject to no formatting rules other than being strings, this is obviously the case. However, the LDS has some special id's that have fixed parts, including a prefix to identify a particular LDS database, so that the id can refer to a particular record in a particular LDS database. And then there is the idea that we've discussed for Better GEDCOM, that every record should have a UUID that gives it an absolutely unique identifier for all time and place. With this idea it is very easy to conceive of Better GEDCOM transport containers that are not closed, that hold objects that refer to persons in, say, well-known sets. Imagine being a third party provider who publishes a Better GEDCOM container of the first five generations of descendants of the Mayflower Pilgrims, or the ancestors and descendants of the U.S. presidents, or the ancestors and descendants of the European royal families. A Better GEDCOM transport file could easily be able to hold references to objects from these well-known sets without having to contain the objects themselves. There are lots of ramifications of having UUIDs for our record ids that we are just now beginning to realize to power of.

Tom Wetmore
gthorud 2011-04-05T06:33:14-07:00
Tom,

Re. attached Gedcom file.

Seems like you see the biger picture. I agree with what you say about Gedcom files. It seems to me that the Gedcom file should in principle be stoped as early as possible in the user-user-user pipeline, and be converted to BG format. Given the current problems with incopatible Gedcom files, you realy don't know if the receiver/importer will be able to extract all info from the file, so it is better that the sender/exporter does the conversion to BG to see if there are any problems. BUT, it might be that the receiver would actually be able to do a better conversion, so we should not prohibit the enclosure of a Gedcom file.

Re. being able to reference parts of a Gedcom file. Creating a special mehanism for this would probably be seen as soliciting the enclosure of Gedcom files, rather than converting to BG, so such a mechanism should not be created. But, I assume the flexibility in sourcing/citation info that might be devolped for BG would make it possible to include such a reference, but an importing program is probably not going to do anything special with that - other than showing it to the user or printing it.

What do you think, testuser?



One result of this discussion is the possibility of having two or more BG files in the same container. Should one of them be the primary one? Should the importing program be expected to import all the BG files, or leave all but one as "evidence" or whatever? I would like to understand if there are any complications.

We should create a new requirement for UUIDs. Since the exact use must be investigated, it could perhaps be a syntax requirement for the time being?
testuser42 2011-04-06T16:24:08-07:00
Hi all,
thanks for lots of good thoughts...

Yes, you are right that GEDCOM should be converted / assimilated and not imported into the BG container. So scratch that part.

BG should be able to hold a "secondary" BG in its container. The primary BG would be identified by name: e.g. the container file is "Royals.BGZ" and in it is a "Royals.BG" and any imported BG gets a prefix like "I_" ("I_Princes.BG") or somesuch...
If the system works for one imported BG file, then it should work also with nested BGs and BG-containers -- unless there's a software problem that comes up when unpacking a packed file inside a packed file inside a packed.... :)
(BTW, I'd like to nominate 7z for the container format)

I guess the cross-referencing would really only work with UUIDs, and requiring info about the "creator" of the BG would be helpful, too. This creator or author could be id'd by a UUID, too. Actually, every BG file could have one, and maybe a PGP signature?

Have we really forgotten the UUID requirement up to now? That definitely should be in BG!


PS I found a previous discussion that touches some of this:
http://bettergedcom.wikispaces.com/message/view/home/30051923
gthorud 2011-04-07T09:44:14-07:00
I have created a new reuirement for Unique IDs, Syntax11, so if anyone wants to discuss that, copy the req text into a new topic for Syntax11. I have found some previous discussions, but there may be others that are important. Focus should be on summarizing the possible applications.
gthorud 2011-04-10T13:35:08-07:00
I am starting to see a need to allow several BG files to be included at the outhermoust level in the container, both should be imported be the receiving program.

There is at least two more solutions that would distinguish between files to be imported and enclosures.

1. Assuming that the container can handle file directories, you will have a top level directory. Within that directory you could have the BG files to be imported and a directory with a special name containing enclosed files and/or directories to be saved somwhere.

2. You could have a special top level file (or structure depending on container solution) containing info about the content of the container - including what to process. But this is a complex solution that should only be used if it can satisfy other requirements.

In order to keep things simple, if a container is enclosed, there should not be a requirement to import BG files in it into the program - although the user could select to do so later - if he finds it necessary. That will be an easy way to distinguish between files to be imported right away and those that should wait.

I am not prepared to discuss container file solutions now, but why do you think we should 7z?
ttwetmore 2011-04-07T12:47:25-07:00
Major Concern with Evidence01
This quote from the Evidence01 requirement has me VERY concerned:

"It is therefore suggested that handling of evidence data and not just conclusions, is postponed to a later release of BetterGEDCOM and the current work should simply not do anything that might make separate handling worse."

I read this as meaning that Better GEDCOM is chickening out on adding evidence, record-based support to its data model. It certainly means it's being postponed to the future, which to me means postponement to oblivion. My opinion has always been that adding support for record/evidence-based genealogy should be the most important goal of Better GEDCOM, a goal that cannot be postponed. If Better GEDCOM decides not to cross the chasm from person-based methodology to support for record/evidence-based methodology I believe it changes from a worthy enterprise to a near trivial tweak of GEDCOM.

Tom Wetmore
gthorud 2011-04-07T14:34:50-07:00
I will copy Toms posting to a new Topic with the subject Evidence01 ....

Please continue the discussion there.
gthorud 2011-04-07T14:33:30-07:00
Evidence 01 - Evidence & Conclusion Model
Description: BetterGEDCOM could handle evidence and not just conclusions

Why:
Current GEDCOM is structured so that data about an individual or family is always the "latest working hypothesis". It is therefore difficult to identify the actual evidence, particularly when the "latest working hypothesis" is a composite of various bits of evidence.
Also, in the event of discovery of an error, it can be difficult to (a) identify subsequent issues and (b) revert to an acceptable set of "working hypothesis"
To overcome this, it appears as a minimum to be necessary to record evidence and conclusions separately.
See Evidence and Conclusion Process
Note this requirement is effectively the same as (possibly part) adopting the "Evidence and Conclusion Model", which is linked to, but not the same as, the "Evidence and Conclusion Process". See Glossary

Way forward:
It is far from clear to the author that a comprehensive set of genealogical processes exist to handle evidence and conclusions at a detailed, data element, level. In particular, it is far from clear how it is possible to "roll-back" to an acceptable state after discovery of an error.
Interesting processes do exist to derive genealogical conclusions from evidence, but these are quite different from analyses undertaken by most genealogists.

It is therefore suggested that handling of evidence data and not just conclusions, is postponed to a later release of BetterGEDCOM and the current work should simply not do anything that might make separate handling worse.

Se also previous discussion linket to in the Requirements Catalog.
gthorud 2011-04-07T14:37:35-07:00

On 7 March 2011 ttwetmore posted this:

This quote from the Evidence01 requirement has me VERY concerned:

"It is therefore suggested that handling of evidence data and not just conclusions, is postponed to a later release of BetterGEDCOM and the current work should simply not do anything that might make separate handling worse."

I read this as meaning that Better GEDCOM is chickening out on adding evidence, record-based support to its data model. It certainly means it's being postponed to the future, which to me means postponement to oblivion. My opinion has always been that adding support for record/evidence-based genealogy should be the most important goal of Better GEDCOM, a goal that cannot be postponed. If Better GEDCOM decides not to cross the chasm from person-based methodology to support for record/evidence-based methodology I believe it changes from a worthy enterprise to a near trivial tweak of GEDCOM.

Tom Wetmore
gthorud 2011-04-07T15:25:37-07:00
Well, I had writen a reply, but the system decided that I was not loged on, so I lost it all. Be aware ...

I'll nstart again.
AdrianB38 2011-04-07T15:33:14-07:00
Tom - I'm guilty of writing that caveat. I had my reasons...

1) Creating the data model for the real life side of things is easy. I imagine ditto for the current ESM citation style though I'm not wholly convinced that the multi reference stuff has been analysed yet (e.g. digitisation of a microfilm of an original). Creating the data model for evidence handling is not easy since in my head it needs more than just the creation of personas / evidence people / whatever.

2. Since there was a feeling that BG needed to get something out fast, the idea of phasing the model to produce the easy stuff first and the hard stuff later, seemed attractive.

3. I am far from convinced, as I said, that we have understood what evidence handling needs - my own idea of rolling back in case of an error - how do I support that? No, how do I DO that? Then there's the objective / research / input / output / conclusion stuff - all that stuff that you convinced me should go into the log - that is an integral part of evidence handling in my view and I simply don't see how it should be modelled yet. I just know it needs more entities than we've mentioned. (And more processes...)

4. If BG is to mean anything, we need to get the software developers on board. Again, getting them on board in 2 stages seemed more attractive, particularly if the initial steps are obvious and simple - hell, they're NOT simple - the multi-person event, groups, places, all those are going to non-trivial jobs. If this chasm exists (and I believe it does) then the developers won't even recognise any benefit to come from evidence handling and so will ignore BG if it comes as one indigestible lump.

5. One last thing though - this is a Wiki - it's trying to gain consensus - I put a starter proposal there but if the members think we should progress in a bigger leap, then let's agree it! (But I'm also convinced that I was NOT the first person to make this suggestion).
gthorud 2011-04-07T15:56:04-07:00
I am not sure if E&C is the most important issue in BG, and I am sure the rest is not trivial.

However, I am not aware of any decision to postpone this requirement so I suggest that the paragraph should be removed from the requirement.

I apologize that I have not followed this topic lately, but someone has to try to do some organizing. Also, I want to spend some time on sources and citations since I have started on that.

I am writing the following on thin ice, but a concern is any other solutions to the problem than what Tom is proposing. If FS comes with a standard (or whatever) proposal, we should be prepared for that situation. What is the difference between the Dead ends model and the data model in NFS. What is wrong with it and Gentech? Could a ?two-level? NFS model be a subset of a multilevel solution? What are the rules that would collapse a 2 or 2+ level model into a conclusion only model on import?

Also, how does the dead end model fit into a model for recording of the research work, citations and excerpts, and the evaluation of the information found in a source - that me result in many events in the E&C model. Do we need a description of a complete process (and the data recorded by it) that leads up to the evidence and conclusions in the E&C model?

My apologies if this has been sorted out already.
ttwetmore 2011-04-07T19:44:01-07:00
I apologize for my testiness.

Geir, Trying to answer your (excellent!) questions:

The NFS model it two tiered, with persona records and person records. Personas are great for holding "evidence or record-based" data, and persons are great for holding "conclusion or person-based" data. But these aren't rules that can be enforced in the NFS application. A persona records simply holds whatever a user of the NFS application chooses it to hold; scary thought. An NFS person is a grouping together of persona records. No justification has to made when adding or removing personas from a person group. The person record itself has no attributes of its own other than its global identifier. The users who put personas into the persons have the option of specifying what the overall person should look like when it is displayed. That is, users can say what the preferred birth event is, what the preferred name is, and so on. This can be changed by any other user. In the NFS application millions of the personas are just plain junk, so many persons are cluttered up with large numbers of junk personas. Some of the junk is really worse than you can imagine.

So an NFS persona record has all the available attributes that you would normally think of as being found in a generic person record. But on the otherhand an NFS person record is little more than a "bag" that holds persona records, with some added info that specifies what are the currently preferred facts.

In the DeadEnds model there is a single person record that does duty for both "evidence/record-based" persons and "conclusion/person-based" persons. The DeadEnds person records can be arranged into a tree of any number of tiers. So a person record can be BOTH a "bag of closer-to-evience persons" AND can have its own attributes, at the same time. Having its own attributes solves the problems that NFS has to use that user choosing approach for. So in the DeadEnds higher level persons, you can choose to add attributes if you need to resolve issues between the attributes in the lower level persons. If you don't need to resolve any attribute issues, the higher level person records will simply inherits their attributes from the lower level ones.

To collapse data from multi-level model into a conclusion-only model on import is actually quite simple. You create a single person record for each "tree" by bringing together into that person record all the birth events, all the names, all the other attributes of all the original personas "all the way down" recursively. As you do this you keep all the source references to the original source records so that every attribute inside the collapsed person record still refers to the proper source. The only problem is deciding which birth event or name among the many possible birth events or names that might be in the collapsed person record, should be given priority in the final flat person record, that is, the one display on screens, or to print in reports. This would be handled by conventions. Certainly if the higher level person has its own attributes, those would take priority, but if it doesn't I would suggest to simply use the order of inherent in the order of the lower level persons. Of course, if there are quality flags they could used also.

(In order to experiment with my software, I am actually doing the REVERSE PROCESS -- I am taking large, rich GEDCOM records from my LifeLines database, and by using the source references found within those records, I am BREAKING THEM APART into the persona records that I should have started with!!)

The DeadEnds model is conventional as regards sources and repositories. It has those two record types. All "evidence/person-based" records should refer to a source. For me a citation is nothing more than a formatted string that is generated by templates that use field values found in two places: 1) the references between the evidence-person records and the source records, which will state, for example, on which page of that source the evidence was taken from (or the URL of the page, or the on-line database); and 2) the source record itself where info like title, author, publication year, and so on are found.

I haven't added any records to the DeadEnds model for research logs and todo lists, etc. I was hoping to piggyback off some other model that has worried about that. I know that those things are very important, but I've never been very interested in them, so I hope I can simply take the ideas from some model (Better GEDCOM?) that does worry about them.

I don't like the pure GenTech model because of the extreme use of the assertion entity, and the fact that the GenTech model is a fully normalized model which makes it almost impossible to visualize. Being fully normalized is something that used to be needed in ancient times when databases were automatically assumed to be relational. This is no longer the cases. Normalized models completely obfuscate the simplicity of data models by adding table after table of difficult to grasp relationships.

Tom Wetmore
ttwetmore 2011-04-07T20:12:07-07:00
Responding to Adrian:

"Tom - I'm guilty of writing that caveat. I had my reasons..."

I apologize for any awkwardness I have caused you to feel by my intemperate comments.

"1) Creating the data model for the real life side of things is easy. I imagine ditto for the current ESM citation style though I'm not wholly convinced that the multi reference stuff has been analysed yet (e.g. digitisation of a microfilm of an original). Creating the data model for evidence handling is not easy since in my head it needs more than just the creation of personas / evidence people / whatever."

Ah, a very interesting statement. I believe creating the data model for evidence handling really is that easy! So easy in fact it doesn't even require a new record type! All it needs is extending the current person and event record types to be able to recursively refer to "lower level" persons and events. The fact that you think it's harder than this is something to explore. I think the way to do that is to imagine "use cases" that you would go through with a genealogical application in following some text book research processes, and analyze the data needs from a model to support them. When I go through those use cases in my head I always come up with my very simple extension. It would be great if others tried out that experiment.

"2. Since there was a feeling that BG needed to get something out fast, the idea of phasing the model to produce the easy stuff first and the hard stuff later, seemed attractive."

Understandable if the evidence extension is hard.

"3. I am far from convinced, as I said, that we have understood what evidence handling needs - my own idea of rolling back in case of an error - how do I support that? No, how do I DO that? Then there's the objective / research / input / output / conclusion stuff - all that stuff that you convinced me should go into the log - that is an integral part of evidence handling in my view and I simply don't see how it should be modelled yet. I just know it needs more entities than we've mentioned. (And more processes...)"

I agree with your concerns that there must be a proper intersection between the ideas I always talk about and the world of research logs and objectives, but I don't see fundamental problems. Say a research objective is an entity in our model. Our evidence records can simply refer to them, with the reference meaning "I am an evidence record that was researches, discovered and extracted because of that objective." Then our applications can provide us with lists of all the evidence that we discover while carrying out different objectives. Ditto for todo list items. If a todo list item is an entity in our database, it will refer to the objective record that the todo list is designed to help, and any evidence record discovered while carrying out that todo lists would refer to the todo list record. Then our application could show us our todo lists and what we have done so far in carrying them out. I think anything that is simple conceptually, as objectives and todo lists seem to be, should always be represented by things that are just as simple in a data model. I do see undoing conclusions as a problem with my model if we want to remember that we have made the conclusion so we can warn the "future us" not to do it again. I guess that frankly I am not worried about the problem of formally remembering my mistakes. I don't see much utility in it. Probably sounds like a cop out!

"4. If BG is to mean anything, we need to get the software developers on board. Again, getting them on board in 2 stages seemed more attractive, particularly if the initial steps are obvious and simple - hell, they're NOT simple - the multi-person event, groups, places, all those are going to non-trivial jobs. If this chasm exists (and I believe it does) then the developers won't even recognise any benefit to come from evidence handling and so will ignore BG if it comes as one indigestible lump."

Good practical issues. No disagreement from me. I just want it all, now, fast.

"5. One last thing though - this is a Wiki - it's trying to gain consensus - I put a starter proposal there but if the members think we should progress in a bigger leap, then let's agree it! (But I'm also convinced that I was NOT the first person to make this suggestion)."

Point taken.

Tom W.
AdrianB38 2011-04-08T03:47:38-07:00
Tom said "Ah, a very interesting statement." Good! I'm glad about that - reminds me rather of the time when I was a junior programmer and my programs failed with a dump. If I couldn't read the dump I'd take it to the technical guru of last resort. If he said, "Leave it there, I'll look at it later,", you came out knowing he'd never look at it. If he said, "Hmm, interesting..." you knew you were in with a fighting chance of some help...

His name was Tom too...(And he was a great guy)
AdrianB38 2011-04-08T04:06:35-07:00
More seriously - I'm quite happy to move my doubts and suggestion out of there and replace it with a more neutral comment about establishing scope and priority, now that we've got a discussion going. And there's a good case for saying that the Requirements Catalogue should cover it _all_, but any release of the Standard might be in phases to allow for priority / digestibility / whatever....

I think the point is we need to establish scope of just what this Evidence 01 requirement is. In my mind, when I was getting pessimistic / pragmatic about it, I saw the thing as a whole from setting of objectives through task definition / research log / input data / output conclusions / combined persons.

Clearly, modelling _all_ that is more than just recording that the person entity can be either an evidence or a conclusion person or both.

Perhaps it would be useful if I went through a sample case in my head - then wrote it out to show what is needed - and then we can decide scope etc.
AdrianB38 2011-04-11T08:48:48-07:00
OK - I have gone through a scenario, scribbled things down and from that, concocted the basis of a (tactical) research process. Why? Because I wanted to see what really needed to go into the BG Data Model if we went for fully rigorous methods where the logic arguments are documented. This is the stuff I alluded to in my post of Apr 7, 2011 11:33 pm. (At least - that's the time I'm reading - it may be translated to CET)

And in fact, the Evidence & Conclusion Model only comes in at the very end.

See http://bettergedcom.wikispaces.com/Research+Process%2C+Evidence+%26+GPS i.e. page "Research Process, Evidence & GPS"

So.... >>If<< we confine the scope of Evidence01 to the recording of the conclusions, then Tom's post of the previous Friday, 4:12 am where he says "I believe creating the data model for evidence handling really is that easy!" does make sense. We simply have to make sure that the other stuff is recorded elsewhere.
AdrianB38 2011-04-17T07:52:19-07:00
Source04 - Length of citation
Created:
Description: There must be no limit in BetterGEDCOM on the length of a citation, whether that citation applies to a source (often expressed as part of a bibliography entry) or an event, attribute, person, relationship, etc, etc (often expressed as a footnote or end-note).

Importance: Mandatory

Why?: The majority of citations will be short. However, some users may wish to record a Proof Argument inside the citation. Any limit on the length of such a citation would be arbitrary and could be exceeded, so should not be permitted.

See also requirement Syntax10 "No restrictions on item length or value", which is a generalised version of this requirement.

Source: See discussion of "The Missing Link - a new entity type or a new type of source?" and specifically the discussion of the options for citations in there.

Way forward?: While many users would never wish to use lengthy citations, there seems no good reason to forbid their use.
AdrianB38 2011-04-17T12:41:23-07:00
Data-Event01 - Events with multiple people, with roles
Description: BetterGEDCOM must support the recording of events that affect multiple people. In particular, it must be possible to record the role of each person in the event.

Importance: Mandatory

Why? Events do affect multiple people. Current GEDCOM has almost no ability to record multi-person events, excepting perhaps births and adoptions. However, the parents of a birth in GEDCOM are usually implied by the parents of the appropriate family, creating potential issues when that family is an adoptive one. It would be better to have a birth event involving three people (e.g. child and two biological parents typically), with this data separate from the family.
GeneJ 2011-11-28T17:37:08-08:00
P.S. Do these examples help?

marriage:
... Indiviual / role
... John Smith / p1 or groom
... Sarah Thomas / p2 or bride
... Samantha (Jones) Smith / MotherOfGroom
... Saul Smith / FatherOfGroom
... Sally (Franks) Thomas / MotherOfBride
... Joseph Thomas / FatherOfBride

For a death tag, I have only one principal, but various associates
death:
...
... Joe Peterson / principal (or deceased)
... John Peterson / LossOfFather
... Thomas Peterson / LossOfBrother
... Sally (Smith) Peterson / LossOfSpouse
GeneJ 2011-11-28T18:12:34-08:00
@Andy wrote (with his can-OOoOpener), "... are not genealogical relationships."

Check it out ....

See Curran, Crane and Wray, "Numbering Your Genealogy: Basic Systems, Complex Families and International Kin" (Arlington, Virginia: National Genealogical Society, 2008).

In part, from Madilyn Coen Crane's contribution, "Complex Families," beginning on p. 17, "Traditional numbering systems were designed to present a group of people, all blood kin, who descend from a single immigrant ancestor. When genealogists treat families of the past, their narratives acknowledge multiple marriages and stepchildren; but the numbering schemes, as originally planned, omit step children and adopted children; and they make no provisions for carrying down such lines. Surname changes that result from variations of the nuclear family also remained in limbo [*]. ... Because of the serious genetic issues at stake, as science continues to explore and treat inheritable medical conditions, this paper recommends that adoptions of past eras be treated as frankly as all other aspects of genealogical research."

Crane goes on to explain in some detail how the NGSQ system (aka the "Quarterly" standard) has been "expanded" to report about complex family circumstance. The material covers--adoptions, stepchildren, multiple marriages of direct descendants, etc. Crane writes, "In order to maintain a clear identification of biological ancestry, while including adoptions and stepchildren in the family structure, the phrases adopted by and stepchild of are added to the parenthetical summaries of descent."

:-) --GJ

*Crane references an endnote, "The legal status of adopted children during past centuries is rarely documentable. Not until the 1850s did America begin to see the emergence of adoption laws, generated primarily by society’s need to define legal heirs in the settlement of estates .... ," citing Lawrence M. Friedman, _A History of American Law_, 2d. ed. (New York: Simon and Schuster, 1985); and Carole Shammas, _Inheritance in America_ (New Brunswick, N.J.: Rutgers University Press, 1987).
WesleyJohnston 2011-11-29T02:18:50-08:00
Regarding GeneJ's post ... I am wondering: does "multiple marriages of direct descendants" includes descendants who married each other?

That's something just about every family tree will have to deal with at some point, once you are back to small villages in the 1600's or 1700's.
ttwetmore 2011-11-29T04:02:43-08:00
I read somewhere that before colonial times, on average, marriages were between third cousins, many closer, many further, but this was the average. Thus the issue of direct descendants marrying each other not only occurs, but is the norm not the exception. I have many direct ancestors who were married first cousins, second cousins, third cousins with various levels of removal as well. This leads to what is sometimes called "ancestor collapse." All genealogical programs that I am aware of handle it with no problems. The main problem for software is recognizing, when generating reports, people and families that have already been output, and inserting the appropriate "see over there" tags instead of the redundant information. When iterating to find lists of descendants or lists of ancestors, software has to use "set" structures rather than "list" structures to build those lists, but this is just basic programming. I even have cases where two sisters married two brothers, all direct ancestors of mine, and eventually some of their offspring intermarried, still my direct ancestors. Thus I have people that show up in at EIGHT places in my ahnentafels. I use my own genealogical software, and I can fortunately leave it to the software to keep everything ship shape.
GeneJ 2011-11-29T04:50:52-08:00
Hi Wesley,

(1) In the context of Crane's "Complex Families" the multiple marriages to which I referred have more to do with recognizing all of the family members (all of the "children" from all of the marriages of the heads of family). Joe marries first Susan, and they have children. Joe marries second to Margery, who was previously married with children, and Joe and Margery adopt several children. Crane (NGS/Quarterly) recognizes all the children--whether biological, adopted or step children.

Crane's summary would, I hope, make our Adrian smile, "Modern genealogies are, increasingly, family histories rather than mere recitals of begats within a bloodline. It is important to include all the individuals who shape the nature and personality of a family ... equally important ... that a clear identification of the biological line be maintained ... the guidelines offered in this paper will enable modern Americans to compile genealogies that accurately portray family units in the context of their existence and to present authentic family structures through which the history of our nation can be correctly understood and chronicled."

(2) Intermarriages within the larger family are also addressed in _Numbering Your Genealogy...", but this is not considered a "complex" situation. (As Wesley says, "just about every family tree will have to deal" with this.)

See _Numbering Your Genealogy_, 10-11, for work of Joan Ferris Carran, CG on "_Multiple lines of descent_ from a single forebear, a relatively frequent occurrence ...."
ACProctor 2011-11-29T06:09:19-08:00
Interesting replies. I think some clarification of my post might be needed though:-

Re: "New guardians, foster parents, adoptive parents, etc. are not genealogical relationships; thus no need for a genealogical program to deal with them. They are, however, social relationships and thus a Family History program *must* deal with them"

I'm not actually writing a program - at the moment. My goal though is to define a "source format" for generalised Family History data. Genealogical (aka biological) relationships are far easier to handle but often virtually irrelevant to the lives of the individuals. I believe this is an area where GEDCOM gets hung up badly. It implicitly extrapolates from pure genealogical relationships to infer "family units".

Re: "There are much larger events (wars, earthquakes) that affect masses of people but one wouldn't normally say they have genealogical significance".

Again, Family History is a much more generalised goal. Something like the outbreak of WWII could be a hugely significant time marker (aka Event) that people's lives are related to, irrespective of whether they enlisted or not.

Re: "Using the example of a traditional union, my marriage "pfact" has two principals--call them p1 and p2 (or "bride" and "groom")"

The idea of 'principals' is something I experimented with, although it started to feel like there should be more levels, e.g. the Person(s) being born, their parents, informants/etc. Similarly with a marriage. However, grouping all Persons in an Event (whether by PFACTs or otherwise) doesn't make it easy for the recording of other historical facts, e.g. "X met Y at so-and-so's wedding". Ideally, X and Y should have EventRefs to the associated wedding, even though they may not have had a direct role in it.

I'd like to generalise Marriage to a generic Event-category of 'union' - something that has been discussed elsewhere in these threads. This should include civil & religious marriages, same-sex partnerships, cohabitation, and even multi-party marriages in those cultures that still permit them. This obviously puts more of a strain on the role definitions and things like FatherOfBride may be too specific. I believe the same could be possible for a "change of responsible control" (...can't think of another term off-hand) to include guardianship, fostering, & adoption. This is why I was interested in the slavery form of ownership.

Q: Is BetterGEDCOM focused purely on genealogical relationships, on generalised Family History, or something in between? It felt a little like there was some difference of opinion in the replies to my post so I just wanted to check before going off on a tangent :-)
GeneJ 2011-11-29T08:18:35-08:00
Hi ACProctor,

I don't know what a generalized family history is, but users come in all flavors. BetterGEDCOM is focused on user requirements. I organize materials--sources and "tags"/pfacts--in my software that sometimes reports just the key genealogical facts (BDM), and other times supports a full range of genealogically significant data that would include a host of other life events. Others are more interested in recording record data--which might represent BDM, or it might represent a host of other life events.

(1) I'm saying genealogical relationships are not limited to "biological" relationships. There are certain key genealogical relationships that "link" together a family structure so that a genealogy can be created. These key genealogical relationships identify the family unit--the heads of the "family" (which includes the unions, who are the children (biological, adoptive, foster, step etc.), the unions of those children and their "children" (repeating the noted concepts).

Beyond these key genealogical relationships by which a genealogy is structured, there are other genealogically significant relationships (some of which are the basis of much inferential genealogy).

(2) I realize there are different limitations in your desire to have a "source format," but what you describe as "levels" are to me just different events or facts or different roles that principals or associates play. As below (see 3), believe I'd have less use for generalized roles.

(3) "...generalize marriage." Why? Not all the roles in marriage/union events will be the same.

You wrote, "...puts a strain on ... FatherofBride." I assign the roles in the event. If I don't have a bride, I don't have a "FatherOfBride." Since I make frequent use the role "Bride" in my family file, I have pre-established a role, "FatherOfBride." I also have a role FOBride (which wouldn't translate well); believe it could just as easily be FatherOfP1," or FatherToPrincipal. The associate role will point back to the event.

In software I use, key genealogical relationship tags (the stuff required to structure the family tree, see 1 above) are categorized as birth group, death group, etc.

I don't have examples of marriage events (ie, union, civil union, etc.) involving more than two principals. While I think I know how I'd enter such an event, I'll leave that challenge to be discussed in better context by others.

Hope this helps.--GJ
ttwetmore 2011-11-29T09:12:50-08:00
“Re: "New guardians, foster parents, adoptive parents, etc. are not genealogical relationships; thus no need for a genealogical program to deal with them. They are, however, social relationships and thus a Family History program *must* deal with them"”

There are many relationships between people, genealogical and not. I want my “genealogical” program to deal with all of them. The only distinction to be made, in my opinion, is that biological relationships are the only ones that allow the construction of pedigrees.

“I'm not actually writing a program - at the moment. My goal though is to define a "source format" for generalised Family History data. Genealogical (aka biological) relationships are far easier to handle but often virtually irrelevant to the lives of the individuals. I believe this is an area where GEDCOM gets hung up badly. It implicitly extrapolates from pure genealogical relationships to infer "family units".”

When you have your source format in a written form I’d like to see it.

I don’t know what you mean when you say that it is easier to handle biological relationships.

Actually GEDCOM does not infer the family units. In GEDCOM families are represented by FAM records that the user much create somehow. You might mean that the parent/child relationships are only possible within GEDCOM within the context of a FAM record. If this is what you mean then I agree with you. It is wrong to require that all biological relationships be mediated by FAM records. On the other hand, though, it doesn’t really hurt all that much. If you know that A is the father of B, and that’s all, with GEDCOM you have to create two INDI records and one FAM record with the father pointing to the FAM with a FAMS, the child pointing to the FAM with a FAMC and the FAM pointing to the persons with a HUSB and CHIL. No need for a marriage event or for a mother. You might not think this is the optimal solution, but honestly it ain’t all that bad.

“...Family History is a much more generalised goal. Something like the outbreak of WWII could be a hugely significant time marker (aka Event) that people's lives are related to, irrespective of whether they enlisted or not.”

I agree, but I don’t have a good feeling for how these large scale events would be handled by a genealogical program.

“Re: "Using the example of a traditional union, my marriage "pfact" has two principals--call them p1 and p2 (or "bride" and "groom")"

The idea of 'principals' is something I experimented with, although it started to feel like there should be more levels, e.g. the Person(s) being born, their parents, informants/etc. Similarly with a marriage. However, grouping all Persons in an Event (whether by PFACTs or otherwise) doesn't make it easy for the recording of other historical facts, e.g. "X met Y at so-and-so's wedding". Ideally, X and Y should have EventRefs to the associated wedding, even though they may not have had a direct role in it.”

I don’t like the term “principal” as a role tag, though it is convenient in some cases. I think role tags should come from a relatively large enumerated set of tags, possibly with subtags (e.g., parent.biological, parent.step, parent.adoptive), with the capability of extending the set for unanticipated situations.

“I'd like to generalise Marriage to a generic Event-category of 'union' - something that has been discussed elsewhere in these threads. This should include civil & religious marriages, same-sex partnerships, cohabitation, and even multi-party marriages in those cultures that still permit them. This obviously puts more of a strain on the role definitions and things like FatherOfBride may be too specific. I believe the same could be possible for a "change of responsible control" (...can't think of another term off-hand) to include guardianship, fostering, & adoption. This is why I was interested in the slavery form of ownership.”

I agree with this.

“Q: Is BetterGEDCOM focused purely on genealogical relationships, on generalised Family History, or something in between? It felt a little like there was some difference of opinion in the replies to my post so I just wanted to check before going off on a tangent :-)”

A. I want to be able to create full timelines for the people I research by collating together everything I find out about them. All of this information ultimately comes from evidence, and much of that evidence takes the form of descriptions or reports of events that the persons participated in, and the relationships that were formed with other persons by those events. So all those things (sources, evidence, events, persons, ...) must be modeled very well. Most of us believe that the the vital events are the most important of the events, and I do agree with that, but to get full timelines we must be able to accommodate everything we find out. I think we all agree that sources, evidence, events, persons, are among the key parts of a genealogical model. We disagree, significantly sometimes, on exactly how those ideas should be handled. But there isn’t much discussion here about the large “events” that you are concerned with. Here’s an example from my own work:

My wife’s grandfather was a Polish peasant living in West Prussia (now Poland) and was swept up by Germany’s need to mine coal in the Ruhr valley in the late 19th century. Many peasant men were essentially conscripted and taken west and forced to mine coal. This overall “event” has a name, the “Western Flight,” though more properly rendered in the German as something like “Oesterflugt.” This is one of those “global events” that was critical to this person’s life (he went “awol” and managed to get to the United States, with wife and kids to follow). So I want to be able to mention and describe this global event along with the more prosaic events that occurred in the man’s life. However, a genealogical program pretty much forces this mention to be placed in note structures as there doesn’t seem to be any better way to do it. I don’t really mind this, as my software knows how to take the notes that I write and insert them into any biographical output that I generate. For a real historian, however, I think this “event” should be better model-able.
ACProctor 2011-11-29T09:43:49-08:00
Thanks Tom. Some useful stuff in your reply.

I believe purely genealogical relationships are easier because they're more rigid. We only have one set of biological parents and that fact is independent of the date. All the other types of relationship are time dependent and potentially overlapping.

With Family History, I wanted a way of ensuring I could record first-hand testimony and tales passed down through the family generations. I've allowed for a comprehensive Narrative element (more than mere Notes) into which your example would fit nicely. However, there is also a feature for taking a description of some event out of the Narrative and making it a full Event entity. This would usually be done if the same Event appeared in more than one Person's history since the common reference point effectively pulls their lives together. The choice is optional and made very easy since a Narrative element can contain embedded PersonRefs, PlaceRefs, and EventRefs.

When I write the format up Tom, it will appear at www.parallaxview.co/familyhistorydata but that's just a placeholder at present.
eleanordew 2011-11-30T06:46:54-08:00

ACProctor: "I must admit that I hadn't thought about slavery and Person-ownership Eleanor. If OWNR was removed, was any replacement mechanism or convention put in its place?"

As far as I could tell, no mechanism was put in the place of "OWNR", but I am not very experienced in this format.

GeneJ:"Believe slavery would be a good concept about which we should document a series of case study materials (Wesley calls them "benchmark cases"). These could be outlined on a new wiki page and linked back to testuser's page, "BetterGEDCOM test suite."
see the linked items at the bottom of http://bettergedcom.wikispaces.com/BetterGEDCOM+test+suite

How would one go about collecting this test information? Do you just need some good examples? -- Eleanordew
GeneJ 2011-11-30T07:21:06-08:00
He Eleanordew,

Thank you for replying.

"Do you just need some good examples." --Yes, exactly.

There may be several good examples within Mills' article, "Which Marie Louise is 'Mariotte'?: Sorting Slaves with common names." (http://www.bcgcertification.org/skillbuilders/MariotteNGSQv94-183-204.pdf ).
testuser42 2011-11-30T10:07:14-08:00
ACProctor ... a Narrative element can contain embedded PersonRefs, PlaceRefs, and EventRefs.
That is a very nice idea.

Maybe veering OT:
Tamura Jones has some very good articles about Family History vs Genealogy, and on the concept of "Family" in current GEDCOM and software (e.g.: http://www.tamurajones.net/FamilyInScientificGenealogy.xhtml )

But one article I'd like to point out is this:
http://www.tamurajones.net/AFrameworkForClassicalGenealogy.xhtml

I'd like BG to be able to handle all of the legal, official and biological evidence, as well as all of the stories connecting the people and making them more than just names. I'd also like to have the people connected to the places and times they lived in, and collect stories of some places, but that's really another thread.
ttwetmore 2011-07-13T10:27:34-07:00
Adrian,

Good points as always. Let me give a quick example how I have implemented some ideas you just expressed.

If one were to follow my ideas about events as records, vitals as structures within records, and relationships as references between records, then given a person record and these three options, how would one find the person's father record? Before answering let's go a little beyond my earlier example and consider the following person record fragment:

0 @i1@ IND1
1 NAME Thomas Trask /Wetmore/ IV
1 SEX M
1 BIRT
2 DATE 18 December 1949
2 PLAC New London, New London, Connecticut, United States
2 FATH @I2@

I'm using GEDCOM just so we can understand it easily. This is a person record with a single vital structure for the birth. See what I did? I added a father reference to the birth vital. I never said anything about this facility earlier, because I didn't want to weird anybody out, but there is nothing wrong with this in my view. It's a multi-role vital structure! It is inside principal person's record and it points to other persons the principal is related to.

So how would you find the father of a person in a data model where there can be multi-role events, vital structures and relationship references?

Simple really. If your person points to a multi-role event, check the roles in that event, and if you can infer a child-father relationship between this person and another role-player in the event, there you have it. Obviously the a multi-role birth event is perfect. If the person has a relationship already pointing to his/her father, you're home free. And if you allow vital structures in the form I have just given as an example, it's just as easy to follow a role reference from within a vital as it is to follow a direct relationship reference.

The whole real point here echos Adrian's point. It doesn't matter how the father to child relationship is represented (any of the three described is fine); in the user interface there is no distinction to be made -- a user looking at the screen just sees a person and his/her parents with no clue as to how the underlying data is represented.

It is fair to ask, though, how do these different implementations of father/child get established in the first place? Well, most genealogical applications these days are person-centric. You edit persons, so in this context it is only natural that all events be subsumed into vital structures in person records. However, some genealogical applications are both person-centric and event-centric. In those you can typically enter an event or you can enter a person. When you enter an event you eventually want to link the person role-players to their proper roles. So if you use such a program in an event-centric way the important relationships will end up being expressed through multi-role events. But when you use these programs in the person-centric way you fall back on the vital structures. And of, course, underneath the software could transform between representations and there would be no need for the user to ever know.

As Adrian points out, the user doesn't have to know how the event is being represented.
AdrianB38 2011-07-13T13:03:42-07:00
Tom - I _started_ to get a bit worried with your example - what if there was a multi-person event AND a relationship?

I think there are a couple of answers to that:
1. In your example, if you have a BIRT vital event within the individual, then you shouldn't have a multi-person BIRT event, so the issue doesn't arise.
2. If you have a relationship of FATH outside the BIRT vital event within the individual, then you shouldn't have the father in either the BIRT vital event within the individual or the multi-person BIRT event.

So, assuming that similar logic applies with other potential issues, you shouldn't have an issue. I can't see any NEED for having the same info in 2 places.

OK, OK, "should" - what if you have? Well, there has to be some rule but it's a rule that's in the application because only the developer knows what's the best way of making the app fail gracefully. It's no part of BG to define how to get out of a "Garbage In Garbage Out" scenario.

What's making me disturbed is - if we have 3 ways of doing X, are there justifications for the 3 ways?

I suggest that if we DON'T have personas, then there would be no need for anything other than the one method. If we do have personas, even if they are as limited in their application as those in nFS (if indeed, they are limited) then we need these extra methods in order to describe the information in a source in a codified manner without interpretation.

E.g. a persona from a census would use an AGE tag inside the CENSUS event; it wouldn't create a Birth event because that would need interpretation to create a date for the Birth event.

Similarly, a persona from a marriage (post-1837 UK) wouldn't create a Birth event to record their age or their father's name because that would need interpretation to create a date for the Birth event.

However I still can't get it out of my head that we've got one representation of relationships too many
- roles in multi-person events - sure, we need that.
- Single person events - sure, we need them for personas. Where do we put the relationships for personas though? Inside a single person event, or outside (but still inside the persona's data-record)? I can see how the input might be person centric or event centric as Tom suggests, but to turn his own point back - underneath the software could just use the one representation.
ttwetmore 2011-07-13T14:03:23-07:00
Adrian,

Good-oh.

First, yes, you never need the same info in different places -- no redundancy required.

Second, you ask whether there is a need for the three ways.

I think the 3 ways have subtle differences from one another, so have some legitimacy.

Multi-role event -- I believe this is the right way to encode direct evidence from most physical records that record those events -- birth certificates, marriage certificates, death certificates. These certificates are intended to document specific events, multiple persons are mentioned in them with roles wrt to the event. It is only natural, IMHO, to encode a physical representation of a multi-role event with a computerized, codified multi-role event record.

Vital structure -- I believe this is the right way to encode simple statements that mention the birth, marriage or death of someone, but are NOT statements intended to actually document the event. I hope you can see the difference there. There is some event out there somewhere in the background, but the statement is only indirectly about that event. A little subtle. "Almyra Jane Wetmore was born in Digby County, Nova Scotia." Would you call that the documentation of an event? Yeah, there's an event in there, but it's not mentioned explicitly, only implied. IMHO this is best handled by a simple vital structure in Myra's record. But would anyone really complain if it were handled by a one-role event record? I guess I wouldn't. I would appeal to parsimony arguments, however, to keep things as simple and as succinct as possible. One way you have a single record with a simple birth structure in it. In the other case you have two different records and you need an additional mechanism to link them together. More than twice the "computer data capital" to represent the same information. Inefficient. Unparsimonious. Bad.

Relationships -- I believe this is the best way to handle generic statements of relationships. Here's a good example for you. "Thomas Williams and Mary Doty were first cousins." Let's say you don't know anything about their parents yet, so obviously nothing about their grandparents. All you know is that one each of their parents was the child of at least one, maybe two, other persons. How many hidden events are there in this one? Hard to know. How many implied persons are there in this one? Well, you tell me. Do you really want to create all the anonymous person records and associated events to build up the pedigree you'd need if you had to encode relationships using simple linkeage-linking?I don't it is reasonable to handle this example by creating events, though it might be a good exercise for the reader to decide how they would do that. Most genealogical programs of today would just about choke if you tried to get this info into them in a usable fashion. But don't you think we ought to be able to do so? I think the best solution is something like:

0 @I1@ INDI
1 NAME Thomas /Williams/
1 SEX M
1 RELA @I2@
2 TYPE first cousin

0 @I2@ INDI
1 NAME Mary /Doty/
1 SEX F
1 RELA @I1@
2 TYPE first cousin

Another simpler example. Say you know two persons are siblings, but that's all you know. How do you handle that? In LifeLines I do it this way:

0 @I1@ INDI
1 NAME Thomas /Williams/
1 SEX M
1 FAMC @F1@

0 @I2@ INDI
1 NAME Mary /Williams/
1 SEX F
1 FAMC @F12

0 @F1@ FAM
1 CHIL @I1@
1 CHIL @I2@

Pretty simple, but it requires a family record. Now I don't mind family records, but you might. How would you do this without a family record? Could you do it with event records? Well yes, you could. You'd create two birth events, each with the proper child role, but you'd have to give each birth event refer to the same ANONYMOUS father record and same anonymous mother record. That would work fine, but do you really want anonymous person records in your database. I'd rather not, even though they kind of make sense. I think some programs don't even allow the idea of a person record without a name. I support them in LifeLines by allowing the name "//" (LifeLines might even support the empty string -- I'll experiment later to find out), but I don't like to use them. But if you can only infer father and mother from roles in multi-person events, you have to use this mechanism to encode that Thomas and Mary are siblings. Wouldn't it be better (if there are no family records) to let them point to each other with sibling references? You could also do this by just adding the two anonymous person records, one for the father and one for the mother and them using simple child-paret relationships to link them.

Oh, so much fun. By accepting the three different ways I have proposed I believe that BG can always have the best possible way to codify roles to events and relations between persons, and the ways are simple, obvious, and make common sense.
eleanordew 2011-11-22T13:06:15-08:00
This is my first post, so if it's in the wrong place, would one of the moderators please move it? Thanks.

One of the weird things that happened in the development of GEDCOM 5.5.1, was that the role tag "OWNR" was removed. This is unfortunate because the Owner of a slave is a key resource in finding out more information about that person.

In fact, the data event "slavery" would fit into Data-Event01, i.e., it's an event with multiple people and roles (multiple owners and multiple slaves). It could also fit as an Event longer than 1 day (I can't remember which Data-Event that is) because slavery is a long-term event.

My question is, I suppose, how does one work the idea of slavery and slaves into the BetterGEDCOM format?
ACProctor 2011-11-28T15:29:02-08:00
This is a tough subject but a crucially important. I was struggling with it last week - before I found BetterGEDCOM - and I'm still struggling with it:-

Does an Event group multiple Persons, or are Persons attached to an Event? There are arguments that work in both directions so the answer is probably somewhere in between.

I originally wanted to define an Event as simply something happening at a particular Place at a particular time (& possibly lasting over a period of time). It's possible that some Events may not directly involve any Persons. However, in most cases, a number of Persons will either be directly involved (e.g. people present on census night) or associated with an event because it affected their life (e.g. outbreak of WWII). I was handling these as PersonRefs from the Event in the direct-involvement case, and EventRefs from each Person in the associate case.

However, some Event types indicate vital data about the Persons such as age, occupation, place-of-birth, etc., and I didn't want these in the Event element because they're properties of the relevant Person. But if all cases are done as associates then it places a huge burden on the interpretations of Event-type/class/category and Person-role/status.

A good example would be a 'union' such as a marriage. There would be no single place that had both a Bride and a Groom reference. You could only infer the couple by collecting all Persons having a reference to the same union-type Event and then filtering by role.

As another example, consider a change of family-unit parentage. If a group of children get new guardians, or foster parents, or adopted parents, then how would a genealogical program make that connection when it loads the data. Again, it would have to collect all the relevant Persons associated with a particular Event-type and filter by role.

This would all need very some careful choice of distinct Event-type/class/category and Person-role/status
ACProctor 2011-11-28T15:37:23-08:00
re: "One of the weird things that happened in the development of GEDCOM 5.5.1, was that the role tag "OWNR" was removed"

I must admit that I hadn't thought about slavery and Person-ownership Eleanor. If OWNR was removed, was any replacement mechanism or convention put in its place?
Andy_Hatchett 2011-11-28T15:45:27-08:00
"If a group of children get new guardians, or foster parents, or adopted parents, then how would a genealogical program make that connection when it loads the data."

I'm sure this will open a can of worms but...
New guardians, foster parents, adoptive parents, etc. are not genealogical relationships; thus no need for a genealogical program to deal with them.

They are, however, social relationships and thus a Family History program *must* deal with them.

Not that any of the above helps clarify the matter at all :)
GeneJ 2011-11-28T16:07:26-08:00
@eleanordew wrote, "My question is, I suppose, how does one work the idea of slavery and slaves into the BetterGEDCOM format?"

Believe slavery would be a good concept about which we should document a series of case study materials (Wesley calls them "benchmark cases"). These could be outlined on a new wiki page and linked back to testuser's page, "BetterGEDCOM test suite."
see the linked items at the bottom of http://bettergedcom.wikispaces.com/BetterGEDCOM+test+suite

(I've linked a blank wiki page there, hoping I'll be able to summarize information from some cases I worked on about other BetterGEDCOM topics.)

It would be nice to have a good set of case materials over a range of slavery related issues. --GJ
ttwetmore 2011-11-28T16:25:51-08:00
I will give my take on your points. I have been thinking about these ideas for more than twenty years (for whatever that is worth).

"Does an Event group multiple Persons, or are Persons attached to an Event? There are arguments that work in both directions so the answer is probably somewhere in between."

In my opinion events of genealogical importance almost always occur at a fine time scale at a fine place scale and involve persons that have relationships both with respect to the event, and therefore often with respect to each other. Just think about a birth certificate or a census family group for "classic" examples. I think of the event record and the person records (the name persona has now become very popular to distinguish them from the "conclusion" persons of most genealogical programs) as forming a cluster of records.

"I originally wanted to define an Event as simply something happening at a particular Place at a particular time (& possibly lasting over a period of time)."

There are the events of genealogical significance (genealogical significance means providing information about key points in a PERSON's life [birth, death, marriage, immigration, education, land transaction, military service, ...]). There are much larger events (wars, earthquakes) that affect masses of people but one wouldn't normally say they have genealogical significance. A war as a whole is significant, but what's important at an extended genealogical sense (the family history sense), is when a person enlisted, when they were promoted, the regiments they served in, the ships they sailed on. These are much finer grade events or attributes than wars in general. It would be important to model these macro events for historical purposes, but it might be too much for genealogical and family history. I frankly do not have a good answer to the question of what I think is the best way to place a war or a natural disaster into a genealogical database.

"However, some Event types indicate vital data about the Persons such as age, occupation, place-of-birth, etc., and I didn't want these in the Event element because they're properties of the relevant Person. But if all cases are done as associates then it places a huge burden on the interpretations of Event-type/class/category and Person-role/status."

EXACTLY. Events provide information about persons that is both inherent in the person (e.g., sex of person), BUT ALSO, non-inherent information that is only valid WITH RESPECT TO the event. Age is the prime example, but also things like occupation, residence place, and even name(!) also fit in this category. In the DeadEnds model each event record refers to the person records using event references. These event references not only "point" to the person records, they also carry the role information, BUT ALSO they carry the non-inherent properties of the person. In the DeadEnds model THIS IS WHERE AGE goes. So, with the event holding the date and place of the event, and the role-references holding the ages of the persons at the time of the event, software can easily generate a derived birth event for the persons. And so on. Or tie occupation or residence to a time line.

"A good example would be a 'union' such as a marriage. There would be no single place that had both a Bride and a Groom reference. You could only infer the couple by collecting all Persons having a reference to the same union-type Event and then filtering by role."

A marriage certificate is the evidence of a genealogically significant event. From that evidence we extract an event record (with type marriage) and two person records for the bride and groom (other persons optional for witnesses, parents, officiator). The two role references in the event to the bride and groom can carry, age, residence at time of marriage, occupation at time of marriage, birth place, etc. This info in the event references is available for all the conclusion making processes that follow up the collection of all the evidence. If the software is smart enough obviously.

This leaves open the old question of whether events point to persons or persons point to events or both. The key issue is the 1) recording of the roles so we can infer the person-to-person relationships between the persons, and 2) the recording of the NON-INHERENT information. My preferred solution (which is one of many I agree) is to have the role references from events to persons hold the non-inherent information, but to also have redundant person-to-event references that don't have to carry any other information except a "pointer' (no role or non-inherent attributes). Some argue that this is too redundant. It doesn't bother me a bit. My master database now fits into a GEDCOM file of many megabytes, and I don't fret at all about the fact that the FAMS and FAMC links are all redundant with respect to the HUSB, WIFE, and CHIL links.

"As another example, consider a change of family-unit parentage. If a group of children get new guardians, or foster parents, or adopted parents, then how would a genealogical program make that connection when it loads the data. Again, it would have to collect all the relevant Persons associated with a particular Event-type and filter by role."

You've answered your own question, properly in my opinion.

"This would all need very some careful choice of distinct Event-type/class/category and Person-role/status"

I don't think it's that hard.
ttwetmore 2011-11-28T16:38:41-08:00
Andy said: "New guardians, foster parents, adoptive parents, etc. are not genealogical relationships; thus no need for a genealogical program to deal with them. They are, however, social relationships and thus a Family History program *must* deal with them."

I agree, but I think the majority of persons would expect to be able to handle at least the more "important' non-genealogical relationships, meaning step, half and foster relationships, possibly others. To me the only thing that distinguishes the natural biological father and mother relationships from all the others, is the fact that those are the only relationships we can use to build pedigrees. But of course, that agrees with exactly what you said!

As I discussed a few times, I think there are two types of "relationships" to be modeled in genealogical/family-history programs, and they are very closely related to one another. The first is roles in events, which define the relationships that a person has with respect to an event (e.g., father, mother, child, in a birth certificate). The second is direct person-to-person relationships that are documented in evidence with any reference to the event that established the person-to-person relationships. For example, an obituary will often include the names of the deceased's parents, siblings and descendants with no reference to the various multitude of real events that established those relationships between the deceased and the others. So when we extract the person records from the obituary we simply link the various persons to the deceased by direct person-to-person links that also hold the relationship between the two.

Note that we can also generate the person-to-person links between persons by making inferences about roles in an event. For example, and sorry this is so trivial, but I think it's important, if A has the father role in a birth event, and B has the child role in the birth event, the there is a direct father to child relationships between A and B. I've asked this question many times: should we always favor event-to-person roles or should we favor person-to-person relationship? My answer is that at the evidence level we should use the method that is best suited by the evidence, and at the conclusion level we should choose one and stick to it.
ttwetmore 2011-11-28T16:56:08-08:00
Slavery fits into the event and persona model analogous to military service.

First, overall slavery as a large scale historical "event" is not an event of genealogical significance so we don't model slavery as a whole as an event. If we are compelled to describe the evils of slavery, we can write that up in a note record and have the persons who owned or were slaves point to that record.

However, there are both event-to-person roles and person-to-person relationships that exist in the slavery situation. There are events of buying and selling and events of manumission that have roles. These define the condition of personal slavery and are all software needs to know if and when someone was a slave and someone was a slave owner. And then there is evidence that simply states the existence of a slavery relationship between persons that establish person-to-person relationships, e.g., a wikipedia article that states "Sally Hemings was Thomas Jefferson's slave" -- create two personas and connect them with slave/owner person-to-person relationships.

The only thing giving me pause is the idea of being the child of a slave and therefore being born into slavery, as the state of being a slave begins at birth. I'm sure we could see our way to a simple solution.

Treating slavery like other events and relationships allows our software to infer if and when or where a person was a slave, just as our software can determine whether a person did military service, when and where.
GeneJ 2011-11-28T17:08:02-08:00
@ACProctor,

You wrote, "A good example would be a 'union' such as a marriage. There would be no single place that had both a Bride and a Groom reference. You could only infer the couple by collecting all Persons having a reference to the same union-type Event and then filtering by role."

Non-tech here, but you lost me a little there. In the software I use today, I associate multiple people with a given pfact--say a marriage, a death, etc.

Using the example of a traditional union, my marriage "pfact" has two principals--call them p1 and p2 (or "bride" and "groom"). I can "associate" other persons with that pfact--parents of the bride/groom, members of the bridal party ... the photographer. If I wanted to, I could associate every person who was sent an invitation (call them "invitees"). In the example I used, there is still just one pfact/event/tag.

Hope this helps.--GJ
AdrianB38 2011-07-08T13:44:12-07:00
Yes - it is lengthy, isn't it. Sorry, but somehow I'm not surprised it's a TMG thread! Though maybe I should talk...

Anyway - adding an associated entry for "loss of father" - strictly, your software ought to enable that to be visible and I _ought_ to start muttering about putting in duplicated and unnecessary data. However, where the software's reports don't highlight it, then it seems as good a way as any to highlight these issues.

(This is why I am cynical about claims that software can produce excellent narrative reports - adding in highlights like those are useful but I've never seen it well done. If you were manually writing the report, you might say - "In YYYY, the children lost their mother" (i.e. picking up on that associated entry but grouping it), or you might say "In YYYY, XXX lost her mother" if it's just one child you're reporting on, or if you've written about her death in the previous paragraph, you might not mention it at all. You simply can't concoct a rule that's applicable for all cases.)
GeneJ 2011-07-08T14:34:15-07:00
@Adrian ...

You wrote, "In YYYY, the children lost their mother" (i.e. picking up on that associated entry but grouping it), or you might say "In YYYY, XXX lost her mother"

It's a powerful feature and helps so much during the research process.

Take the example of a family that is migrating. Well, where a young child died may be one of the few notations you have to place the family on a certain date along that migration route.

In more than one occasion, an associate tag helped me locate an obituary published where one sibling lived -- and it called out the residence of all the other siblings.

In the software I'm most familiar with, users can exclude all witnessed events from narratives in the software I use, or include them. I also have a special detailed family group sheet format set up--and the witnessed events are great on the FGS.

I've seen some great narratives, but that doesn't mean that all users have or even desire the skill it takes. --GJ
Christine_E 2011-07-11T02:10:47-07:00
Other examples of one person having multiple roles:
A person was born at home and delivered by the father who was a doctor and signed the birth certificate. Father and Delivery doctor.

A Graduate could also be the Vocalist or Valedictorian or Presenter of a gift to the school on behalf of the class.

A Graduate's parent(s) could also be a Teacher and/or Principal there. (I knew a person who was the Principal/Teacher/Parent for her child's graduation.)
Christine_E 2011-07-11T02:22:20-07:00
Retirement (parties) can be a multi-person event especially if there is a retirement incentive (golden handshake) and several people take it. In teaching, many teachers can retire on the last day of school. In lay-offs, there are mutliple people quite often.

And because some people marry a co-worker or help their child get a job at the same company, the event can apply to more than one person.

I was at a funeral yesterday where the founder of a family business died and other relatives worked there and also spoke at the funeral. So even though only one person died, the company and funeral had multiple roles played by family members. Within the company, one person can be promoted to different roles while working there.
AdrianB38 2011-07-11T07:02:43-07:00
So with a little bit of thought, it looks like just about any event can be seen, in the right circumstances, as a multi-person event, potentially with multiple roles per person.

Two "howevers" spring to mind:

1. However, just because an event _could_ under other circumstances be a multi-person event, doesn't mean it always should be recorded in that form. My gut feeling says that in a BG file, there are sound reasons to _allow_ software writers to create single person events "inside" an individual's details, just like GEDCOM does for all things today.

I'm not saying anything about the database used internally.

2. However, I'll bet that for every combination of people, event and roles, someone will say - "That's not a multi-person event - that's one of these, one of those and one of something else again." That's OK by me. You do it your way, I'll do it mine. It's allowed to be like that!
ttwetmore 2011-07-11T07:17:24-07:00
Adrian,

Events provide three critical types of genealogical information. The model here is that we have evidence for an event and we are codifying that event into "evidence records."

First there is the event record itself -- date and place and other non-person particulars.

Then there are the role-players in the event -- the persons mentioned, attributes mentioned, and their roles with respect to the event. It is important to separate the attributes into intrinsic attributes, that is long term attributes of the person, e.g., name, sex, from the attributes only relevant at the time and place of the event, eg., age, place of residence).

Then there are the goodies one can glean about the relationships between people. This is as obvious of knowing that the child-role is a child of the person in the father-role. But there can be much more subtle clues as well, as relationships between people can be mentioned in evidence, completely outside the realm of the event itself. One good example is a witness on marriage certificate. The marriage certificate will define one event record and a person-role record for all the person mentioned on the record, with their roles with respect to the event. Witness is one of those roles. But what if the witness is also described as the sister of the bride? This establishes a relationship between two people that IS NOT based on the event roles.

If we are to be general, we must have mutli-role events, and if we wish to codify the events into evidence records in our databases, we must codify the into event and person records. We must link the records via their roles with respect to the event, and we must be able to codify the "extra-event" relationships that are mentioned by the event evidence.

TW
AdrianB38 2011-07-11T08:55:38-07:00
Agreed. In principle. I think.

To give a concrete example, an English marriage certificate gives the occupation of each party, their (alleged) residence, etc, etc., thus:

1856 Marriage solemnized at the Parish Church in the Parish of Nantwich in the County of Chester.

No. 218
When married: Sixteenth day of September 1856

Name: John Doe
Age: Full age
Condition: Bachelor
Profession: Cordwainer
Residence: Beam St
Father: James Doe
Profession of father: Cordwainer

Name: Mary Roe
Age: 20
Condition: Spinster
Profession: -
Residence: Beam St
Father: Michael Roe
Profession of father: Cordwainer

Married in the Parish Church according to the Rites and Ceremonies of the Established Church after Banns by me, [A. F. Chater] Rector

This marriage was solemnized between us
John Doe his X mark
Mary Roe her X mark
in the presence of us
<sig> Charles Coe
Esther Coe her X mark

From this one can tease out:
- one event with up to 7 people in it playing various roles (I probably wouldn't include the minister, nor the witnesses unless I felt they were relatives - though this is a bit chicken and egg. Um);
- up to 7 persons each with several attributes including name; age; marital condition; trade; residence (alleged); education level;
- plus relationships tbw the parties and their fathers, which, as you say, are outside the event.

And the event would need sub-types (banns), location (St. Mary's, Nantwich), etc. And perhaps some extra notes such as "Charles' signature is dreadful" or "Charles is a witness on most marriages on this page."

I think I am yet to be convinced how much one codifies this extracted information ("extracted evidence" if you prefer, though since we don't actually have a problem to solve - yet - that's not strictly true).

One could go to the extreme of writing it all as free-text, one statement per line. The disadvantage here is the inability to search free-text in a robust manner. (Yes, one can. But if everything's free text why don't we just use a word-processor?)

Then there is the opposite end of the scale where one codifies all the information to the same level of detail that one intends to end up with. Note it will NOT be coded in the same manner as one ends up with. AGE, for instance, will be codified as an attribute - we don't create a birth event in order to record the age. I suspect one might very well encode 2 birth events out of this - one for each of the bride and groom, with their fathers being linked into those 2 events. But we do this to record the relationships, not imply the ages.

I haven't necessarily got my head round exactly what it looks like, and which things - like AGE - differ from the ultimate target.

Those are the 2 extremes of codifying evidence / information. In between, there's some good old British compromise of using text for many of the bits of data but codifying the major items, those that you'd create search algorithms on. I'm not sure if this is a compromise or falling between two stools.

To summarise - I agree with you Tom, subject to my wondering if absolutely everything needs to be codified or whether one might get away with only codifying the search data.

And that's one issue for me - I can't envisage the detailed logic that will use this data, so I'm cautious.
ttwetmore 2011-07-11T09:06:09-07:00
Adrian,

I love your pragmatism. I agree with your points. I would "codify" what was genealogically significant. I would likely leave off the minister. By the way I do this kind of codification all the time in my own records, and it seems always to be a compromise between pedantry and synopsis. I generally leave off ministers, doctors (birth and death events), registrars (court events, land events, census events, ...), but I would normally keep marriage witnesses, since they are usually of genealogical significance to the primaries.

Following another of your points, you could imagine marking up the original text as your codification. This also has a long and noble history. In some sense the whole concept of marking up was invented for this very purpose, to give semantic meaning to text without altering the text itself.

More later. I'd like to once more present my views on three contrasting ideas -- events, vitals, and relationships.
ttwetmore 2011-07-12T07:50:51-07:00
Adrian,

Here is what I wanted to mention once more about the importance of three different model components needed to fully record genealogical evidence and conclusions. This thread might not be the best spot to put this, but since one of the three components if the multi-role event, it doesn't seem to far astray.

First is the concept of an event record. This is the multi-person, multi-role record that has been discussed here before. The event record is a codification of evidence for some event found in a source document. The record records the place and time of the event, the type of the event, and any other information pertinent to the event as a whole. Each person mentioned in the event, or at least the persons the researcher is interested in, are codified into person records. The event and person records refer to each other through event-person role references. Though these are event-person roles, they often imply important relationships between the persons. For example, the person playing the child role in a birth event, and the person playing the father role in the same birth event, have a parent-child relationship between them. Many marriage events mention the bride and groom and their parents. There are many implied relationships in those six event-person roles. To support the information about events found in evidence, a genealogical model must provide records for the events and the persons, and those records must be able to refer to each other through the role concept. Software must be capable of inferring the implied relationships between the persons.

The second concept is that of a vital attribute. We often learn about these attributes from a statement of fact, not from any evidence of an event. For example, a source might state that a person was born on at particular day and place with no mention of parents. One could theoretically infer a birth event from this statement, and create a one-role birth event record and a person record and link them with a person-event child role. A great deal of genealogical data is like this, however, so most genealogical data models are designed to handle this information as vital attributes rather than as events. For example GEDCOM uses the 1 BIRT and 1 DEAT attributes to hold this birth and death attributes. Better GEDCOM should support this idea of a vital person attributes.

The third concept is that of a relationship between persons. We often learn of a relationship through a statement of fact, not from the evidence of an event. For example, a source might state that one person was the father of another. One could theoretically infer a two-role birth event from this statement, with one person in the child role and the other in the father role. Or one could more simply create two records for the two persons and link them with relationship references. Better GEDCOM should support this idea of relationship references.

These three concepts occur at the evidence and the conclusion level, though I concentrated on the evidence level above. We can have evidence about events and we can have conclusions about events. We can have evidence about vital attributes and we can have conclusions about them. Yada yada relationships yada yada.
Christine_E 2011-07-12T21:24:18-07:00
Adrian proposed a definition of Event as:

my current favourite is the concept that "an event involves a change of state (i.e. of status)" or, (if you're not into scientific terminology), just say "a change of something".

I looked up the definition of Event on http://dictionary.reference.com and two of the definitions fit our genealogy purposes:

1. something that happens or is regarded as happening; an occurrence, especially one of some importance.

3. something that occurs in a certain place during a particular interval of time.

You were right in that I was suggesting that single-person events be handled the same as multiple person events, especially when the event is the same in both cases. From the user's perspective, why should they enter data differently for one compared to another?

However, for some events, I might not use the Event attribute. For graduations or retirements, for example, I might tag the individuals or put an entry in their notes. If I felt like several people should be grouped because they graduated at the same time, I would create a Group for them. (Since I've never used Groups or Events, this is just my current thinking).

I could see GeneJ's point about it being useful to record where someone's family member died in the person notes, but I would only do it if it likely strongly impacted them, such as if they were still a member of the same household at the time of death. (This would imply a change in the living unit.)

And just for interest's sake, I have a chain migration where everyone who left a certain village in Europe ended up in the same town in the U.S. After a while I could easily scan through the US town marriage records and pick out the immigrants from my village because they were all married by the same minister who spoke their native language. I could group these marriages (and the christening of their children) as a group but I would want to record the minister's name for them, whereas I wouldn't for other marriages.
ttwetmore 2011-07-13T03:33:41-07:00
Christine,

I hope you read my post and then thought about the difference between the event and the vital structure, as these are the two concepts you are now discussing. Any "vital event", for example your graduations and retirement examples, could be represented by a separate one-role event record with associated person record, or simply by a vital structure within the person record. This is the fundamental issue involved here, and I think it is well understood. There are some Better GEDCOMers who have a strong feeling that there should only be one way of doing things, and that there should be a decision here. If that one way decision were made it would have to be in favor of the multi-role event. My argument is, of course, is that you can view these as different things. You should feel obligated to create multi-role event records when you have explicit evidence about an event. And you should feel obligated to create vital structures when you don't have that evidence. And there are gray areas where you can go either way.
AdrianB38 2011-07-13T08:31:00-07:00
Christine - two important points to pick up on:

"From the user's perspective, why should they enter data differently for one compared to another?" They shouldn't. There is absolutely no reason why the user should see any difference on screen between a "thing" that is represented as a multi-person event behind the scenes and another "thing" that is represented as a single-person event behind the scenes, and comes out on the GEDCOM or BG file _within_ a person. Well, no difference except for there's only one participant in one on the screen.

Secondly "for some events, I might not use the Event attribute ... I might tag the individuals or put an entry in their notes ... I [might] create a Group for them"

Absolutely. BG has to make all the options a/v and leave it up to the user which to choose. If they were all identical in resultant functionality, the "one-way" people should rule. (All ... One ... Rule ... All.... Excuse me while I fight off the temptation to misquote the inscription on Sauron's Ring)

However, there are slightly different meanings in each of your quoted ways and I'd hope BG would be able to accommodate them all.
AdrianB38 2011-04-17T12:48:56-07:00
This is just to open up a place to record long-standing(?) conclusions about multi-person events.

These conclusions may be scattered through the Wiki but some things spring to mind:
- the multi-person event is an entity in its own right, of equivalent status to (say) persons. In GEDCOM terms, it's a Level0 thing, just like a person is. Or in RDBMS terms, it's a row in its own right in the table tblEvents.
- Example: a marriage event would be an entity in its own right, pointing to (say) bride, groom and two witnesses.
- Each of the people / event combinations would have a value to describe the person's role in the event.
Christine_E 2011-07-07T21:24:56-07:00
Let's discuss this two ways, first as an event that involves multiple people, then as an event involving one person.

If we think this should only be for multiple people then the Description: should be expanded to something like

BetterGEDCOM must support the recording of events that affect multiple people. In particular, it must be possible to record the role of each person in the event. A situation involving only one person (ie, a single death) is not considered an event for BetterGEDCOM purposes. Example of events are Births, Adoptions, Marriage, Lawsuits, Natural disasters.

Now what about Immigration, Naturalization, Accidents, Graduation, an honor? They could involve one or more of our ancestors. (When they involve only one person, there is probably someone else there, but he/she/they are probably irrelevant to the event we are documenting.) For example, someone immigrated. Most likely he/she came with others even if the immigrant didn't know them. If they came on a ship or plane or train, there were also crew/flight personnel involved.

Shouldn't we document immigration the same even though sometimes it involved only one ancestor and other times it involved several of our ancestors together?
Christine_E 2011-07-07T21:33:49-07:00
I propose that this discussion start by listing things that are events and aren't events in the genealogy sense to give clarity to this requirement.

retirement?
illness?
move to new residence?
religious ceremony?

I ask other members to list more. . .
AdrianB38 2011-07-08T05:16:16-07:00
Christine - I think there are a couple of facets that link into this discussion.

Firstly - what IS an event?
And - should single person events be recorded in BG differently from multi-person events?

OK - what IS an event? There are probably many times that question has been asked in this Wiki and having tried all sorts of definitions involving the presence or not of values, my current favourite is the concept that "an event involves a change of state (i.e. of status)" or, (if you're not into scientific terminology), just say "a change of something".

That being so, I think referring to a "situation involving only one person (i.e., a single death)" as not being "an event for BetterGEDCOM purposes" takes us into territory where we're on a loser. It's not a multi-person event, certainly, but it is an event for a single person, so we might as well call it an event.

What is more interesting is what I think you're driving at, which is, should single person events be physically recorded differently from multi-person events and are there any such events that are always single person?

I think there must be event types that are single person only - injury and illness are two that spring to mind, along with retirement, graduation, promotion, etc... (I just took a quick look at the GEDCOM 5.5 list).

However, I suspect one could argue about several of those - what if a whole family were struck down by an epidemic? Or were all in a traffic accident? And if it were a family firm, it might be a father promoting their daughter? Plus it's always newsworthy when a parent and child graduate together. Move to a new residence could be a move of a whole family. And a death might involve a relative registering the death later - sure, you could add that as a separate event but I'm not a fan of extra events just for the sake of it.

About the only one I can't think of a multi-person event for is retirement. So, unless someone comes up with some more, I think we must allow that any single-person event could also, under some circumstances, be a multi-person event.

Does that mean we need to code all single-person events in BG as if they were multi-person (i.e. as if they were all top level entities?). I don't think so. For one thing, having all events as multi-person dramatically increases the size of a BG file and reduces the readability of the output text - which people will still want to read. Not sure if it increases the coding workload or not. I think coding everything as multi-person would probably reduce the workload.

HOWEVER - if we go down the nFS route of having personas (i.e. stripped down individual records) for sources (a.k.a. the evidence and conclusion data model) then there are sound arguments for keeping the persona bundled inside one person-type record and therefore putting all that persona's events inside the record as single person events.

Conversely, if you don't want to use personas for recording the evidence but are happy to have the evidence as text linked to a source (say) then having all events as top-level, multi-person events is simpler in coding (I think), even if rather bigger in file-size.
GeneJ 2011-07-08T09:12:40-07:00
Humm...

The use of associates is among the few reasons I use particular software. I could almost get downright emotional about it!

Perhaps I'm confusing the requirement, but in my current practice/current software, I add associates and roles to many events. Assigning a role role doesn't mean they were present at the event, but it certainly could include those individuals.

Death -- of father; of mother -- when a parent dies, I add an associated entry for "loss of father" or "loss of mother" to the record of each surviving child.

If the parent survives and a child dies, I add an associated event for the loss of a son or loss of a daughter.

OOo. I have roles for loss of brother and sister, too.

A child marrieds ... surviving parents are associated ... A son enlists in the army .. I associate that event to surviving parents ...
GeneJ 2011-07-08T09:24:07-07:00
Adrian wrote, "It would be better to have a birth event involving three people"

In the the associates/roles enable software I use, events are linked to persons by (a) principal roles and (b) associate roles.

Here is a _lengthy_ user discussion about whether there should be a limitation in the number of principal roles (vs associate roles) per event:

http://archiver.rootsweb.ancestry.com/th/read/tmg/2011-03/1299561812
GeneJ 2011-07-08T09:33:56-07:00
Bringing this up only for discussion.

Should BetterGEDCOM enable/allow an individual to be assigned more than one role in an event?

In the software I use, an individual is only allowed to play one role per event.

Probate is a common example of persons who play multiple roles in an event. It's not so unusual for one or more children to be selected to administer an estate (or designated as executors) and for those same children listed with others as heirs to the estate.

Ala, you have one or more children who have multiple roles in the same event.

My work around is either to create two tags (events, say "probate administration" and "probate") or to create separate roles (say "administrator and heir" and "heir").

I know those new to roles find this a little inconvenient, but the rule "one event=one role/person" does probably save us from many errors (such as marrying oneself, being your own mother or father, or your own pallbearer).
AdrianB38 2011-07-08T13:33:00-07:00
"Should BetterGEDCOM enable/allow an individual to be assigned more than one role in an event?"

I think "yes" - as you say, the probate / will event is one obvious answer.

Executor and Trustee and Beneficiary is one possible combination - Executor and Trustee are 2 different roles. Sure, there are ways around things, you could concoct a probate event and an inheritance event to separate Beneficiary out, but I would find it tricky to split Executor and Trustee. Yes, you could create a new role of "Executor and Trustee", but c'mon, this is getting silly.

Again, in births, it might prove useful for someone to be declared as both egg-mother and birth-mother (in the sense of one who carries the embryo to term). While that is the normal biological combination, in the case of test tube fertilisation, an explicit statement of such might be useful.

While the idea of stopping erroneous entries is attractive, I think it would be the case that the inconvenience from stopping legitimate combinations outweighs the benefits from stopping errors.
AdrianB38 2011-04-21T03:41:00-07:00
Syntax01 - Underlying syntax
Creating a discussion topic for a long standing requirement in case discussion starts....

Syntax01 - Underlying syntax
Description: BetterGEDCOM's underlying syntax must be an existing, non-proprietary syntax

Importance: Mandatory

Why?: We do not want to reinvent the wheel

Way forward?: Options include XML, JSON, GEDCOM
AdrianB38 2011-04-21T03:57:24-07:00
Can anyone comment on how XML might help in defining extensions to a base BetterGEDCOM "language"?

The BG base might define a common denominator for use across the globe - while some countries, religions, whatever, might come up with a formally agree set of extensions to that base - e.g. new events or attributes. For instance, some of the ceremonies employed by the Mormon church are currently in GEDCOM - since these are probably of interest only to LDS church members, one approach MIGHT be to take them out of the base language and into a formally agreed extension. Anyone wanting to use that extension would invoke (in some fashion - or their software would) the BG base plus the LDS extension. The extension would only need to be agreed by the LDS and its definition provided to the users and / or software companies wanting to use it.

This might come, as it were, with XML infrastructure - I don't know enough.

Could this also depend on how events and / or attributes are defined - it might work for new "tags" but not for extra values to attributes? Again, I don't know, I'm just asking.

Note that this is a halfway house between custom or user defined extensions.
GeneJ 2011-04-21T09:07:38-07:00
Thanks for doing this Adrian.

See 31 Jan 2011 Developer Meeting notes
http://bettergedcom.wikispaces.com/31Jan2011DevelopersMtgNotes

Syntax was discussed and not agreed.

There are quite a few discussions on the wiki. In another life, it would be great if they could be referenced in this thread.

Note: It only may have been in the 31 Jan 2011 meeting that a representative of one very large developer commented that if we go with XML, any number of large developers will take a pass on BetterGEDCOM.
Perhaps the details are recorded elsewhere on the wiki or someone in attendance recalls those specifics.
AdrianB38 2011-04-21T12:59:21-07:00
As Dick Eastman would say: Warning this post contains personal opinions...

"if we go with XML, any number of large developers will take a pass on BetterGEDCOM"

I would suggest that anyone saying that is doing so without thinking.

Firstly, if they are a large developer, they should be experienced in XML from their other work. (Oh - I thought they said they were a _large_ developer, then?)

Secondly, all routines to concoct lines of GEDCOM or to read them, must be hand crafted. Conversely, I am told (and even in flippant mode, I need to add that caveat) that routines to parse and unparse (?) XML are available off the shelf - just list the items in the database, describe the XML in some standard form, and press the button.

Thus, if one were making a small change, then the change to the GEDCOM based structure would be small, while the extra overhead of adding in the XML handling from scratch would be large, so yes - for this case, XML ain't worth it.

But BG, in its full pomp and glory, will not be a small change to the language.... No. Way. Jose.

And therefore, I suggest, the overhead of learning XML, adding the routines, etc, will be outweighed by the faster coding of the real work to parse and unparse the data into the XML. So it would, I suggest, take more time to do it in GEDCOM.
ttwetmore 2011-08-04T08:58:52-07:00
I have had recent occasion to use Google protocol buffers to define a data model. This is the format that has been used by Google for many years as its server interchange format for years. GPB's has an external format that is quite similar to JSON. Its main advantage however, and clearly one of its main design goals, was to provide a highly efficient binary format for transmitting and archiving data. A Better GEDCOM database expressed in binary GPB format would be, save for further compression, as compact and teeny weeny as possible.

Though, of course, the final syntactic form that Better GEDCOM data will take as external files is a moot point, and I would expect there to be GEDCOM-syntax versions, as well as XML, JSON and GPB format version.

I added GDB as one more option for external syntax and suggested that any subset of the syntaxes is a reasonable goal.
gthorud 2011-05-09T14:13:43-07:00
Data09 - Collections of source data
Description:

BetterGEDCOM could allow recording of data from sources as a collection of records where none or only some are not linked to persons or other records in the BG-file. Examples are transcriptions of a complete source or a section in the source, e.g. births in a church book, images of same or an index to the source.

Importance:
To be determined

Why?:

Often such collections are published in databases on the Internet, but there could be many reasons why that is not practical, e.g. there might not be a database suited for the type of data or there could be copyright issues. It should be possible to search for data in a collection. It could be possible to link records in a collection to persons etc., incl source meta data, in the BG-file. It would also allow the user to see which records in the collection that are not linked to a person, and thus also to see that a candidate record in a collection is already assigned - thus avoiding e.g. to assign the same birth record to two different persons.

Way forward?:

The solution must be general so that it can handle many types of sources. For structured data, some general data elements, that are common to many sources, could be defined - facilitating searches across collections - e.g.given names, surnames, date of birth, "place of residence", place of birth (or place of event). Data could also be non structured text or images. An alternative could be to encode such collections in terms of persons, places, and events, in separate sets of data (some current programs can convert tabular transcriptions into Gedcom format), or keep the data in table structures with user assigned column headers imported from e.g. spreadsheets, possibly in a two level structure - one for the record (event) and one for the persons. A solution could also be used to store individual source records downloaded from web-services (would require a standard download format) or simply records entered by the user. There are lots of alternatives.


The discussion could initially focus on the desirability of this feature and examples of source types and their format. How common are such collections of source data - not those already on the Internet? Why do we want these data in our genealogy programs? Do any current programs support such functionality?
AdrianB38 2011-05-10T09:06:55-07:00
I'm presuming there's something missing from the simple description "allow recording of data from sources as a collection of records where none or only some are not linked to persons or other records in the BG-file". After all, I can add many sources right now that aren't linked.

Presumably, the bit that's missing is the index to the stuff in the source's text? Or is it a summary of crucial data we want? (not sure if meta-data is the right term? Maybe it is)

Is this requirement simply a variation on my proposed "Codifying Source Info"? See http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/38617826

Possibly it's option 2 from that but with no attempt to encode all the information from the source, only the most crucial information?
AdrianB38 2011-05-10T09:23:37-07:00
"Often such collections are published in databases on the Internet, but there could be many reasons why that is not practical"

There are many transcripts / abstracts / summaries in either document or occasionally spreadsheet form. It's the typical way a British Family History Society publishes its data. But having said that, most would never agree to the codification of their data in an easily transported / imported form because that's money to them. (Most people play fair with not copying the PDFs off a CD).

"Why do we want these data in our genealogy programs? Do any current programs support such functionality?" Just about any program supports having sources without linked persons doesn't it? More to the point is why... If we get the program to review source data and then propose possibilities (as I think Louis is suggesting), then there is a good reason to go down this route even without the full Automatic Data combination that Tom hoped for. (And I'm not sure if Tom ever envisaged ADC actually being automatic right through to writing or linking the data).

Without that extra dimension in the app then I wouldn't put such data in my database as I'd lose one essential quality check, which is - if this source is not cited somewhere, then something's wrong. Note this is NOT me saying that the facility should not be a/v, I'm just saying I wouldn't use it.
gthorud 2011-05-11T15:28:58-07:00
Correction: The word "not" should be deleted from the first sentence in the Description in the first posting above.

Adrian,

The quirement as currently written is most likely only reflecting my personal understanding of a discussion that started in the Developers meeting, and it may be wrong or incomplete - the intention was to create a place to discuss the recording of source info in general, NOT ONLY RELATED TO THE E&C MODEL since that is what I understood the participants in the meeting was interested in.

(I was a bit surpirized by the interest in this topic now. When I have proposed similar requirements earlier the answer has been that "there is no interest" - but that probably depends on who is participating in the discussion.)

So, initially, this is an attempt to understand what we want to discuss and collect ideas.

The requirement is a variant of Adrians requirement (refered to above, I should have cross referenced) but broader in scope as it is not limited to "codified" data. I did not think about indexes or summaries, but that could also be called a source.

The requirement is not about the meta data about the source, but the data IN the source, althouh oge would also record meta data.

Initially, all 3 alternatives in Adrians requirements could in theory be discussed, but alternative 3 (codifying in events etc) is currently being discussed in the "Do we need Personas?" discussion here http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/38695952

Re. data published by the British Family History Society (or similar). If they publish it as a spreadsheet FILE or pdf, that is probably more transportable than having the data in a Gedcom file - so I don't see the big problem. The key is as that people adhere to copyright laws.

Re. program support. I am not thinking about source meta data, as I GUESS you are refering to. It was mentioned in the meeting that Reunion has som support for import of the data IN a source from spreadsheets, and I know some other programs support import/export to spreadsheets or similar, but I am not sure that is for info in sources - really don't know, have not used these features.

If some program wants to use this "feature" (or whatever it may end up as) for automatic data combination, so be it, but it is not on my agenda - or extreemly low on that agenda.



I came across this posting http://bettergedcom.wikispaces.com/message/view/Data+Models/32554704#32666092 by testuser about codifying extracts in Gedcom 6.0 - similar to option 1 in Adrian's requirement. Just thought I would mention it.


There is also a feature using a table structure somewhere in GenXML - see the Data models page. http://bettergedcom.wikispaces.com/Data+Models
gthorud 2011-05-27T13:34:17-07:00
A posting that is relevant to this discussion, discusses among other things, ways to record source data.

http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39610786#39654526
gthorud 2011-05-27T18:29:52-07:00
A link to the discussion on soc.genealogy-computing initiated by Tom, where the above linked to material was imported from

https://groups.google.com/forum/#!topic/soc.genealogy.computing/emHrUVXFvnc
gthorud 2011-05-27T18:46:37-07:00
When I get time to do it, I will describe a solution for tabular (or if you like, record based) transcribed data that has been developed by our National Archives. It has been operational for more than 10 years and has been able to handle all sorts of structured archive material by using a meta database and some simple structuring rules. The reason for using such a database is that it is able to present data structured more similar to the data as it appear in many sources, you are not limited by the simple structure of events in Gedcom.
gthorud 2011-05-29T14:48:21-07:00
I have uploaded a nice litle drawing indicating how information from sources could be stored in various formats. The main point is that more information than relevant for a citation or a Persona can be transcribed or photographed - as many genealogists do.



This is building upon my document presented here
http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39358416
ttwetmore 2011-05-29T15:10:02-07:00
Geir,

Nice. As I said in my comments, I don't see why you need the two source records.

It would also be nice if the persons and events in the final codified version could somehow be implied to be records in their own right, but that might be hard to do in the context of your diagram.

I still have some trouble with the idea of the conclusion person pointing to a citation. I think the individual personas of the conclusion person should do that. The conclusion person should only have to justify whey it binds together personas.

Tom
GeneJ 2011-05-14T08:02:40-07:00
Data-Date01 (was date part of Data03) / Approximately known dates
Setting up a discussion for his requirement.
See:
http://bettergedcom.wikispaces.com/Better+GEDCOM+Requirements+Catalog#Data-Date01
GeneJ 2011-05-14T08:21:09-07:00
There has been a discussion on the TMG list about dates, sort dates and disagreements between users about what some of the approximately known date terms mean to users/developers.

In particular, see the thread, "[TMG] Before/After dates was TMG7 Audit Burial Date," and postings by Chris Sackett, Darrell Martin and John Cardinal. (As some of you may know, John is the developer of _SecondSite_.)
Here's a post by John:
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305337518

He writes (in part), "t is true that many people disagree with how the "after" modifier
works, but it is also true that there is disagreement about how it ought to
work instead. That's not to say no solution/option is possible or desirable,
but that reaching a [consensus] will take some work."

The overall discussions are in two or more threads.
See:
[TMG] Burial after death (was...)
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305293866
[TMG]TMG7 Audit
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305205792
[TMG]TMG7 and TMG8 Audit
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305208277
[TMG] TMG7 Audit Burial Date
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305260915
AdrianB38 2011-05-14T13:54:15-07:00
Not sure which bit is being discussed here. The TMG discussions refer to whether "AFTER dd/mm/yyyy" includes the possibility of equality.

That's not what I understand as approximately known dates - though OK, it's not a known date. I'd envisaged this as referring to, well, "APPROX dd/mm/yyyy".

Anyway - as is part indicated, the issue of equality or not was part discussed at http://bettergedcom.wikispaces.com/message/view/DeadEnds+Model/34409442#34503708

My view - if we want to discuss it here and I'd prefer it to be a separate item - is that we should specify that BEFORE, AFTER, BETWEEN .. AND ..., FROM and TO all allow for the possibility of equality. (See that previous thread)

As indicated on that thread,
(a) the only safe thing to do is assume the maximum number of days are covered - and I _think_ that is done by allowing equality;
(b) only maths and IT geeks understand the difference between "greater than" and "strictly greater than" so it's a fair bet that most people use "greater than" to cover both possibilities - so again, safest option applies;
(c) only maths and IT geeks would write something like
"He died on 10 April 1877"
"He was buried after 9 April 1877" if we want to allow for the possibility of burial on the same day as death (which HAPPENS!)
People will just write
"He died on 10 April 1877"
"He was buried after 10 April 1877"

Shall I raise a specific requirement for specifying what "AFTER dd/mm/yyyy" (etc) mean?
GeneJ 2011-05-14T14:25:22-07:00
Oo. I thought approximately known dates WERE after, about, before, between, etc.
AdrianB38 2011-05-14T14:47:57-07:00
Well, to me "ABOUT" is definitely an approximate date. "BEFORE 31 MARCH 1901" doesn't - to _me_ - have any approximation in it. The 31/3/1901 bit is quite clear.

"BEFORE 31 MARCH 1901" indicates the "real" date is unknown, (OK - this is me, a mathematician remember!) and unknown isn't quite the same as approximate.

If we want to discuss those variants here, then OK - but we need then to define what we're talking about. Which of those combinations are we talking about? (i.e. no "etc" - sorry!)
GeneJ 2011-05-14T16:41:16-07:00
I only have access to a few programs. I'll try to log what I can learn.

TMG recognizes [Date] Modifiers: (Separately, TMG recognizes Sort Dates)

Date within 20 years plus or minus:
circa date
cir date
about date
abt date

Date before date:
before date
bef date
b date
ante date

Date after date:
after date
aft date
a date
post date

Date between two dates:
between date and date
bet date and date
btw date and date
date-date

From one date to another date:
from date to date

Date based on some other event date:
say date
est date

One date or another date:
date|date
date or date

GENBOX calls them "Date Qualifiers" (Separately, GenBox also recognizes Sort Dates)

Approximate dates:
Use the qualifier about, as in about 1946.
You can also use abt, circa, cir, ca, and ~ (tilde).

"Before" Dates
Use the qualifier before, as in before 1928.
You can also use bef and < (left angle bracket).

"After" Dates
Use the qualifier after, as in after 15 May 1929.
You can also use aft and > (right angle bracket).

Estimated Dates
Use the qualifier estimated, as in estimated 1626.
You can also use est and say.

Calculated Dates
Use the qualifier calculated, as in calculated 1742.
You can also abbreviate this to calc or cal.

Note GENBOX includes "Surety Qualifiers" in this same mix: (this can tie to the surety level, apparently)
For a "Marginal Evidence" surety level: perhaps 1750.
For a "Probable Conclusion" surety level: probably June 10, 1752.
For an "Assemblage of Evidence" surety level: almost certainly October 12, 1763.

These can also be added as "Custom Qualifiers"
perhaps 1750; maybe 1750; possibly 1750
probably 10 June 1752; apparently 10 June 1752; presumably 10 June 1752
almost certainly 12 October 1763; most likely 12 October 1763
without a doubt 23 March 1789

And these include descriptions such as:
during the spring of 1920
late winter 1847
early September 1746
Christmas 1752
Sadie Hawkin's Day 1964
the Ides of March 45 B.C.
(Help file notes, "Genbox will not interpret custom qualifiers when sorting dates. Only the recognized portion of the date will be used.")

Date Ranges
To enter a date span, use the keywords "from" and "to", as in from 3 March 1927 to 1 June 1929.
If the end date of the span is not known, enter only the "from" date: from 3 March 1927. You can also use "since": since 3 March 1927.
**If the begin date of the span is not known, enter only the "to" date: to 1 June 1929. You can also use "until": until 1 June 1929.

There are two other sections in this part of the GENBOX Help file:
Entering Date Part Alternatives and Entering Special Dates
AdrianB38 2011-05-15T05:16:29-07:00
Please discuss whether a date range includes the end dates on this thread:
http://bettergedcom.wikispaces.com/message/view/Better+GEDCOM+Requirements+Catalog/39049742
ACProctor 2011-11-27T09:10:38-08:00
What about relational association with other Events? For instance, a date that is AFTER Event-a but BEFORE Event-b.

As well as allowing for both a +/- and a min/max definition of an uncertain date range (together with a humanly readable description that may have extra semantics, e.g. "Christmas 1956"), I've also been allowing for relational associations between Events in a separate <Constraints> element.

Validation of such constraints (to make sure they're achievable and not circular) has been done many times in Project Management software :-)
AdrianB38 2011-11-27T14:55:24-08:00
"What about relational association with other Events? For instance, a date that is AFTER Event-a but BEFORE Event-b."

That's sort of like what I was thinking about with "Data-Date03 Date phrases", as some of the examples I had in my mind were things like (to put it into English) "She was resident in Northwich before her marriage to John". In this instance "before" would code up as "BEF" and the date phrase - which I think is part of standard GEDCOM - is "her marriage to John". Your suggestion indicates you've gone the extra (and quite logical) step of not writing some text that happens to equate to an event (we hope) but actually linking to the event. Rather interesting...
ttwetmore 2011-11-28T07:51:52-08:00
This is an intriguiging idea. I believe I will extend the DeadEnds definition of a date to handle this idea. There are some potential ambiguities of course.

For example, in the DeadEnds model, one of the date forms can express a date as ["between" <simpledate1> "and" <simpledate2>], where <simpledate> is either a full or partial date, possibly with a "double year" for pre-Gregorian dates. (Actually not entirely true, as <simpledate> can also be things like ["about" <basicdate>] or ["interpreted" <basicdate>].

By simply extending the definition of <basicdate> to include "eventReference" we add this new feature. However, the date at the end of the "eventReference" can also be a very complex date, possibly with before, after, about, from..to, between, computed, interpreted, possible, and so forth, so you can see that things could get very complex.

Nevertheless, I think this addition to what a <basicdate> is is powerful. I will add it into my context free grammar for DeadEnds date notation.
ACProctor 2011-11-28T08:28:24-08:00
If it's any help Tom, this is the syntax I've used. The Event element has a <When> sub-element of the form:

<When>
[ DATE_VALUE ]
[ EVENT_CONSTRAINTS ]
</When>


where those sub-structures are defined as follows:

EVENT_CONSTRAINTS=

<Constraints>
[ <AfterEvent Key=’key’/> | <BeforeEvent Key=’key’/> ] …
</Constraints>

DATE_VALUE=

<Date>
<Value [Margin=’err’] [Units=’unit’]> isodate </Value>
<MinValue> isodate </MinValue>
<MaxValue> isodate </MaxValue>
</Date>
AdrianB38 2011-05-15T04:55:49-07:00
Data-Date04 - Date periods
Include this as a separate requirement, to make it explicit what Data-Date01 is concerned about.

Description:
BetterGEDCOM must allow the recording of periods of time, denoted by start and / or end dates. BetterGEDCOM must explicitly define whether or not the end or start date is included in the period of time.

Importance:
Mandatory

Why?:
GEDCOM already allows this. Failure to include will result in failure to convert the vast majority of GEDCOM based files.

Source:
GEDCOM Standard 5.5, page 41

Way forward?:
Include this in the data model.

GEDCOM options are logically equivalent to the following phrases:
FROM date
TO date
FROM date-1 TO date-2
where date, date-1 and date-2 are known, unqualified dates - i.e. "FROM ABOUT 1066" is not included as the ABOUT is not permitted in this requirement.

It is suggested that the end or start date are included in the period of time as this is normal usage in the English language - e.g. "The First World War lasted FROM 1914 TO 1918" - 1914 and 1918 are included in the War's period.

NOTE PLEASE that the explicit clarification that the start and end date are included in the period is EXTRA to what is explicit in the GEDCOM standard, so this isn't quite an exact duplicate of GEDCOM.
AdrianB38 2011-05-15T04:57:09-07:00
Data-Date05 - Date Ranges
Include this as a separate requirement, to make it explicit what Data-Date01 is concerned about.

Title
Date ranges

Description:
BetterGEDCOM must allow the recording of ranges of time, denoted by start and / or end dates, within which an event takes place.That event may take place on a single day, or it may take place over a period of days.

BetterGEDCOM must explicitly define whether or not the end or start date is included in the range of time.

Importance:
Mandatory

Why?:
GEDCOM already allows this. Failure to include will result in failure to convert the vast majority of GEDCOM based files.

Source:
GEDCOM Standard 5.5, page 41

Way forward?:
Include this in the data model.

GEDCOM options are logically equivalent to the following phrases:
BEFORE date
AFTER date
BETWEEN date-1 AND date-2
where date, date-1 and date-2 are known, unqualified dates - i.e. "AFTER ABOUT 1066" is not included as the ABOUT is not permitted in this requirement.

It is suggested that the end or start date are included in the range of time as this is the clear implication of page 42 in GEDCOM Standard 5.5, which explicitly states that:
1852 is equivalent and interchangeable with BETWEEN 1 JANUARY 1852 AND 31 DECEMBER 1852
AdrianB38 2011-05-15T04:59:14-07:00
NOTE PLEASE that the explicit clarification that the start and end date are included in the range is EXTRA to what is explicit in the GEDCOM standard, so this isn't quite an exact duplicate of GEDCOM.

See also Gene's post Data-Date01 (was date part of Data03) / Approximately known dates, viz:

There has been a discussion on the TMG list about dates, sort dates and disagreements between users about what some of the approximately known date terms mean to users/developers.

In particular, see the thread, "[TMG] Before/After dates was TMG7 Audit Burial Date," and postings by Chris Sackett, Darrell Martin and John Cardinal. (As some of you may know, John is the developer of _SecondSite_.)
Here's a post by John:
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305337518

He writes (in part), "It is true that many people disagree with how the "after" modifier
works, but it is also true that there is disagreement about how it ought to
work instead. That's not to say no solution/option is possible or desirable,
but that reaching a [consensus] will take some work."

The overall discussions are in two or more threads.
See:
[TMG] Burial after death (was...)
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305293866
[TMG]TMG7 Audit
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305205792
[TMG]TMG7 and TMG8 Audit
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305208277
[TMG] TMG7 Audit Burial Date
http://archiver.rootsweb.ancestry.com/th/read/TMG/2011-05/1305260915
AdrianB38 2011-05-15T05:15:24-07:00
It is my belief that the range SHOULD allow for the start and end dates to be within the range for several reasons:

1. It's what the GEDCOM Manual says (more or less), viz: Page 42 in GEDCOM Standard 5.5 explicitly states that:
1852 is equivalent and interchangeable with BETWEEN 1 JANUARY 1852 AND 31 DECEMBER 1852

Since "1852" clearly includes both 1/1/1852 and 31/12/1852, then "BETWEEN 1 JANUARY 1852 AND 31 DECEMBER 1852" has to include those dates to be interchangeable.

2. Suppose you have a GEDCOM or BG file saying "AFTER 11 JANUARY 1852"

If there is no clarity whether the original author meant that the event could take place on 11 Jan 1852 or not, then the SAFEST assumption to make is the one that takes the least restrictive view of the possible dates, i.e. the event takes place on or after 11 Jan 1852.

3. It is at least arguable that the English language suggests equality is NOT allowed. However, this is not a safe interpretation.

4. Suppose we do have a burial and a death and we wish to record that the burial took place >>>on<<< or after the date of death, using the "Strictly after" interpretation.

We could write the GEDCOM / BG equivalent of:
X DIED ON 11 JAN 1852
X WAS BURIED AFTER 10 JAN 1852 IN BOOT HILL CEMETERY

But how many people will do the mental gymnastics necessary to do that? Very few. They will write:
X DIED ON 11 JAN 1852
X WAS BURIED AFTER 11 JAN 1852 IN BOOT HILL CEMETERY

Thus, for the majority of people, using the strictly after and not equal interpretation, results in an incorrect interpretation of what they meant, because they do not go through the correct mental gymnastics.

The safest way is therefore to deem that the event CAN take place on the end date(s).
Christine_E 2011-07-07T20:35:13-07:00
Syntax07 URIs (URLs) for external information
I was going to ask what "URI" was but an on-line search found a good discussion on wikipedia. It is recommended to add an link to http://en.wikipedia.org/wiki/Uniform_Resource_Identifier in the Source section of this requirement.

Since information on web pages and in online databases can change over time, the URI/URL should also be accompanied by the date when the information was found there.
Christine_E 2011-07-07T20:56:12-07:00
Data-Ship01 Data about miscellaneous entities
In my opinion, the recording of the history of a ship, locomotive, house, school, town, church, etc. doesn't belong in a genealogy database or in BetterGEDCOM. Books have been written about such items and should remain standalone. Although it is proposed that fields (lines) not have a restriction on their length (TextHandling02), it is doubtful that this implied a book could/should be entered into a field.

When it is desirable to link passengers who traveled together to each other, this could be defined as a Group.
AdrianB38 2011-07-08T04:14:27-07:00
Nice to see someone's reading things, Christine!

Certainly we need to consider that BetterGEDCOM is aimed at the genealogies of people and "Data-Ship01" was never aimed at making BG be an adequate mechanism for writing histories of ships, locos, etc. However, I've had several instances where I want to put in background information, as I'm a great believer in that.

(Face it, the number of people who want to read about my great-aunt Nelly is minimal, but if it's the Lancashire Cotton Famine - OK, not a ship, I know - and how it affected great-aunt Nelly, then it becomes a much more interesting story, not least for the possibility that my reader's great-uncle Stan might similarly have been affected.)

So, it seems to me that something slightly more powerful and structured than a shared-note would be useful. I certainly do not envisage anything more than a simple summary of events and / or attributes. Certainly not a book's worth of data - that would, as you say, be daft.
GeneJ 2011-07-08T07:32:27-07:00
Welcome Christine!

I'm a user (not a technologist).

In an event and associate driven database, I think "Miscellaneous entities" will sometimes take on the form of a "super group."

I don't see the Miscellaneous Entity as a substitute for a book or source--but would see tags and the related citation as part of the Miscellaneous Entity.

Since this concept was added to the requirements catalog, I've come across several instances in my own research where I'm guessing I'd have used the miscellaneous entity if it were available.

In terms of potential--I'd hope software developers would provide for the indexing of such "Miscellaneous Entities"--making these "super groups" easy to find to the benefit of users.
Christine_E 2011-07-08T11:53:04-07:00
Well why don't we list here some of the kinds of entities we would want included in our genealogy file? I'll start:

Textile mill (in my chain migration, most of the immigrants ended up working in the same mill)

Coal mine (and other employers)

a school

These examples could be the basis for Groups I would form. Maybe the people are in the "group" while the entities that are not people is what this item is trying to describe [just thinking out loud]??
GeneJ 2011-07-08T14:13:00-07:00
Ooo. Cool.

A few of the ways I'd use these super groups follows. These would not substitute for entries in the research log; more the result of research.

Revolutionary War - Much work to understand regiments and particular campaigns during the war and separate research the correlates various other sources to that same research (diary, variety of patriot pension files, etc.) I'd like a central place where I could pull certain key dates and events together without bogging down the individual view about my ancestors.

Rumney Depot Cemetery (I have written a small booklet about members of my family buried there, and I'm working with folks in the town to develop some history about the cemetery--like when the land was set aside and by whom)

Emigration from Norway to the US. I have much research about this, including great details from one families trip over. I'd love to record key dates and have a central place to report about the various sources I used.

Ditto, Irish immigration and famine.

Traders in early NW Ohio. We continue to research this trade because two brothers were involved. Ditto, some associated families were engaged.

Photograph Albums!! I inherited almost a dozen photograph albums. I've cataloged and scanned the albums, so I could just attach each image to the people in my file -- but there is a separate genealogical value in recording each collection as a whole. Would love to be able to do that ....

World War II - I have a record of each and every location change my dad made in WWII, and I've separately research each place and correlated it with his correspondence home and significant war time events. Ditto, with his military file. I started entering all of this to his individual profile, but there were just so many tags that I moved it over to it's own project. If I had a Misc. entity, I'd move the entries back to my main file so that again, I could cross associate particular events, but not bog down his personal profile.
louiskessler 2011-07-08T16:01:24-07:00
I also think a GROUP record is required. I'd like to place people or families into groups, e.g. all people with last name Kessler. All people living in my grandfather's neighborhood while he was growing up. All people who also went to my grandfather's place of worship. All people buried at a certain section of a certain cemetery. All relatives who died in WW I.

And groups could be part of other groups. e.g. All relatives who fought in WW I.

But I'd also like a PLACE record which could have events. That would cover your textile mill, coal mine, or school. Events can be particular to that place and also involve people in and not in your research (e.g. event is a certain graduation ceremony and lists all people attending and their role - e.g. graduate, valedictorian, parent, teacher) and other notes about them.

Louis
Christine_E 2011-07-12T14:55:05-07:00
I am trying to figure out the difference between this "miscellaneous entities" and "groups". So far, this discussion has listed:

PLACES
textile mill
coal mine
school
cemetery
(traders in) Ohio
WW II locations

OCCUPATIONS
textile mill worker
coal miners
traders (in Ohio)
military (regiments)

EVENTS
famine
emigration
military campaign?

MULTIMEDIA
photo album scans

NOTES
how one figured out about regiments and campaigns

All I can think of at the moment is that Group is the linking mechanism for multiple people records and non-people records (places, events, notes, multimedia). If this requirement allows users to define occupations, organizations, ships, what do you see in this being different than the current description for groups?

Or what's another way of differentiating between this and Groups?
gthorud 2011-07-12T17:59:11-07:00
I see that there are postings above that confuses what was initially meant by “ship” (a term used because we could not agree what to call it), and now seems to be calling miscellaneous entities (which is not very precise either, in fact it is very general), but it looks like this discussion is becoming more complicated than it need to be.

I have not read the original discussion again, but I seem to remember that what was discussed was what could be called a “physical thing” (although that is not very precise) – it was not a group ( I do not understand the term “super group”), not a place, not an occupations, not an event, not multimedia and the intention was to have something more advanced than a note. Please tell me if I am wrong …

Examples that have been mentioned are ships, cars, pets (there were others) .

I think one class of such physical THINGS would be those that were important to our ancestors. Or it could be a very old thing, e.g. a 400 years old silver spoon that was originally given as a wedding present. Or my uncles first car. Or some very old furniture, or ….

The point is that these are things that we would use to spice up our traditionally boring family histories, that few in the family really bother to read much in.

I must admit that this is not the first thing I would like BG to include, but I see it as an interesting THING for the future.

What is not clear to me is how we would want to see the information about this “thing” to appear in reports? How would it relate to other entities, it could be owned by a person, by a group, it could be located in a place, it could perhaps also be important in some vent, there may be a photo of it, it may be mentioned in a source, and probably more.

Many of the “group type of things” mentioned above should be discussed in the context of groups. ( I then mean groups as a group of people, and nothing else.)

If I were to choose a term for this, I would probably go for “physical artifact” and possibly supplement it with thing that do not fit with that term.
GeneJ 2011-07-12T18:36:21-07:00
Hi Geir:

You're understanding is the same as mine. I should perhaps not have used the word super group, as group might imply just a collection of persons. Believe the examples were sound.

What about "Historical Entity" --GJ
GeneJ 2011-07-12T18:36:46-07:00
*Your
ttwetmore 2011-07-13T03:57:39-07:00
It was I who introduced the word "ship" to represent the idea that we might like to represent any kind of object once in awhile in a genealogical database. Ship was used because it resonates with many genealogists who want to record at least something about the ships their immigrant ancestors arrived on. The question is whether we want this ability in Better GEDCOM. It is trivial to implement with a generic object record with a type field that define what kind of object it is, and then to allow other records to refer to it.

Certainly easy to do, but maybe not slated for an early release.

A group is also a generic entity, but intended primarily to group together persons for some useful reason, e.g., neighbors, fellow graduates, gang members, platoon mates, friends. A group provides a convenient and efficient means to establish a large number of interpersonal relationships by the addition of a single record. For example, a group for friends establishes (n**2 - 1) friend relationships by the addition of a single group that holds the references to n persons.
AdrianB38 2011-07-13T08:52:49-07:00
Requirement Data-Group01 Data about groups of persons (eg. organisations) should cover the group type thing. This requirement is more about a single entity, or a single type of entity, that it would be jolly nice to record in slightly more detail than a single person's note, as background info for the family history (but considerably less than detail than the enthusiast press would desire).

Thus, to me, examples of miscellaneous entities (artefacts?) might be:
- HMS Victory;
- USS Enterprise;
- New York Central Niagara class steam loco (a type of steam loco, rather than an individual) (if I had relatives who drove them)
- Magna Carta

All these turn out to be physical objects, or types of physical object. Not sure that's a part of the definition though. Although every time I think of a non-corporeal example, I can do it as a Group - e.g. the French Impressionist Movement.

For those physical objects, while one could list a Group of people who fought or worked on HMS Victory, that's missing the point for me - it looks at the ship from the wrong angle. It would be an interesting angle for a naval enthusiast but it's not mine. I just want a little bit of background information about HMS Victory. Currently I have the as shared notes - but it would be nice to structure it a bit more.
gthorud 2011-07-13T12:56:33-07:00
Christine_E 2011-07-07T23:22:06-07:00
Data02 Support for all conventional genealogical processes
Please define "genealogical processes".

It is probably not the goal of the BetterGEDCOM Project to have all genealogy programs create the same types of displays and reports else there wouldn't be functional differences between the programs when a user is deciding which program to use. Rather, a core set of reports could be defined/required. To distinguish themselves from other companies, additional functions/ reports can be defined by each program. (But this could put us back in the situation we are trying to get out of.)

Add "Syntax04 Extensibility by software companies" as a dependency to this requirement, if applicable.
AdrianB38 2011-07-09T11:29:25-07:00
Christine - re "Please define 'genealogical processes'." - that's the sort of question I'd ask! And I probably wrote the requirement...

The requirement says: "The data model that underlies BetterGEDCOM must provide a set of data entities that will allow genealogical applications to support all conventional genealogical processes"

I think this is one of those requirements that simply gives the background and is full of weasel words (e.g. "conventional") and short on detail. Assuming it was me that wrote it, I couldn't face listing input, output, reports about this / that / the other. Apart from anything, if I listed specifics and then I missed something...

There's an over-arching but slightly hidden requirement of "Do everything GEDCOM does" but I did feel it was important to sketch in some context. Given that software exists to do all this stuff (albeit using GEDCOM), I'm hoping that the software gives the requisite background / context and we need only concentrate on the new parts. Hence Data02 was there simply to provide a background context / justification.
Christine_E 2011-07-07T23:32:17-07:00
Data07 Independent record collections
I don't understand what this requirement means or why someone would want to do it. Please elaborate.

Could this mean that a user or software developer could create a timeline of event records in an otherwise empty file, then distribute it so users could link their ancestors to the events?

This sounds like it would leave orphan records.
gthorud 2011-07-10T09:59:16-07:00
I think the high level use is sufficiently described ib the requirement, but it has not detailed. In many cases, there may not be a need for details.

We have not yet defined how timelines would be recorded, but I assume that it could contain info about eg. historic events only, without links to person records.

What type of orphans do you think would be a problem? There might be timeline info not linked to a person in your database, but I dont see that as a problem.
Christine_E 2011-07-11T01:30:53-07:00
an example: Yesterday I was browsing through a book about the landmarks of (a city). Suppose someone took each of those landmarks and made an entry for each one. That is the only thing the database contained. Then they shared the file with members of the (city's) genealogy society. The members each import that database and connect their ancestors to a few of the landmarks, but most of the landmarks are unattached to people (ie, orphans).

Would you expect the genealogy programs to put all the landmarks in a user-requested list or only the used ones that have ancestors associated with them? Would you expect the program to have a feature to delete the unused ones upon user request? Or would you leave all of this to the developers of that program to decide how they want to handle it?

One problem is that orphans would take up room in the user's database which would matter to some users, but not others.

So should BetterGEDCOM just define place, event, and timeline items or also define the handling of those items?
Christine_E 2011-07-12T20:08:31-07:00
Conversion02 - Support for generating web pages
I added a new requirement to the catalog today:

Description: If a genealogy program generates web pages using the data in the database, the web pages must follow NGS standards.

Why?: Not only is following the standards a good practice, but web users should know who generated the pages and how to contact the poster. In particular, web pages should "respect the rights of others who do not wish information about themselves to be published, referenced or linked on a web page".

This requirement DOES NOT say that programs must generate web page, but rather, if they do, then ...
theKiwi 2011-07-12T21:58:24-07:00
1 - Why is the specification for a data file format that is to exchange genealogical data between different applications going to dictate to those software applications how to create their output that is not destined to another genealogy software?

2 - why must any dictate be to the standards of the National Genealogical Society (of the United States) when such standards are quite likely not even known about in other countries?

I don't think it is any business of BetterGEDCOM to be telling software vendors and/or their customers how to publish their data vis a vis privacy.

If a customer desires privacy on a BetterGEDCOM compliant file that they create using their genealogy software application, how that is achieved is up to the software application, just like it is today where different softwares offer different choices - eg to show only names, or initials and last name, or "Private" or "Living"

Roger
ttwetmore 2011-07-13T03:22:39-07:00
This isn't a requirement that applies to Better GEDCOM. There isn't anyone realistically to be the "recipient" of such a requirement. Certainly vendors wouldn't feel any compunction to obey it. There is no way Better GEDCOM can try to impose such an unrealistic obligation on software vendors. That's even before we consider whether it's a reasonable requirement at all. Which I don't see.

It might be reasonable to have a requirement that states that the Better GEDCOM data model must be able to handle the data required by the NGS standards. There are probably requirements pretty close to this already in the catalog. In my generic requirements I wrote:

"2. The data model that underlies Better GEDCOM must be a superset of the models used by existing genealogical applications to the fullest extent deemed possible during design."

This could be extended to include genealogical standards bodies as well as software applications.
AdrianB38 2011-07-13T08:18:02-07:00
Tom makes the positive point that there _is_ a requirement in here - does the BG data model contain enough data to support the American NGS' formats for reports and / or web-pages? Not just that NGS but any other.

I think it's incumbent on any members of said august bodies to review what BG has so far.
theKiwi 2011-07-13T10:32:15-07:00
The creation of any type of output that isn't a BetterGEDCOM compliant file for transferring data from one application to another is beyond the scope of BetterGEDCOM I believe.

If a user wants to create web output (or a register report or a book) from their application Q, the format of that output is nothing to do with BetterGEDCOM.

If a user wants to export their data from application Q to the latest version of Gene Stark's HTML2BetterGED to create their web output, it's up to application Q to correctly write out all the data to a BetterGEDCOM file and then HTML2BetterGED to correctly read that data and then write out the HTML files.

Isn't BetterGEDCOM's only role in this is to provide the file format to correctly handle all the elements of data that are being transferred from one application to another - as Tom notes "a superset of the models used by current genealogical applications".
Christine_E 2011-07-12T20:12:38-07:00
Admin12 - Support Privacy Settings
I added a new requirement to the catalog today:

Description: Genealogy programs must support the user by providing controls/options/settings/reports to assist the user in maintaining the privacy of the people in the database, particularly those living.

Why?: Users will need to differentiate between data that can be shared or not. Users may need to differentiate between different data that be shared with different groups of people. For example, only the data related to one branch of a family would be shared with the people in that branch.

I do not foresee the controls as being on/off, but rather multiple choice.
AdrianB38 2011-07-13T08:09:52-07:00
I think I remember once commenting on this or similar idea before the Req Cat came into existence. While Tom and Roger are correct, there is also an aspect to this where BG HAS a role to play and that is where on the file the security data COULD appear.

The chances of defining what those security values are, is about zero, ecause of the different views in all the apps. But what BG could do is say that we define a security tag called <SECURITY> </SECURITY> (or whatever); the contents of the security tags - be they sub-tags or whatever - are entirely up to the application coders but this is where the security tag goes in the file structure (e.g. against every attribute, fact, note, relationship whatever). Feel free, Mr Developer, to add it in extra places, but here's your starter.

And yes, it might not get used, but there's an important point to be made about showing our due diligence in thinking about it.
GeneJ 2011-07-13T09:14:56-07:00
Echoing other user points.

Roger wrote, "BetterGEDCOM may want to have a standard for how "sensitive/private" data is identified and delimited so that a compliant application can read it back in and retain that delimitation if it is exported in the first place, but even today different applications handle this differently - Reunion for Macintosh by default uses { } as those delimiters (and they can be changed to ( ) or < > or [ ] ) by the user."

If a user has carefully developed project ABC in software Z and exports same via BetterGEDCOM, will or will not the privacy settings he/she used in Z be understood as a privacy setting by BetterGEDCOM.

In the expanded and modern world of BetterGEDCOM, folks will want to mark some data as private and may want to code some material as "for personal use" (e.g. recognize copyright issues).
theKiwi 2011-07-13T10:18:54-07:00
GeneJ wrote: "If a user has carefully developed project ABC in software Z and exports same via BetterGEDCOM, will or will not the privacy settings he/she used in Z be understood as a privacy setting by BetterGEDCOM."

I would expect that as Adrian suggests there would be a tag or maybe more than one (but not very many more than 1) that would be part of the file format specification for a BetterGEDCOM compliant file to cover data that a user has marked as Private or Sensitive or Personal or ??.

It would be up to the software Z to take whatever it is that they have allowed the user to use to do this (the { } from Reunion for example) and turn that into whatever the BetterGEDCOM specification requires on export to a BetterGEDCOM file. Then it is up to Software Y to read these BetterGEDCOM tags and correctly apply what they're saying to however they have implemented Privacy/Sensitivity/Personal/?? in their software Y.

If the user of Software Z decides to export with their settings on, then the data affected won't be included in the BetterGEDCOM file at all, so there will be no use of the BetterGEDCOM compliant tags.

But BetterGEDCOM can't dictate to software developers how this will be seen and used in any software - so Reunion can stick with the { } (or ( ) or < > or [ ]) but if they're going to write out a BetterGEDCOM compliant file they must change the { } into for example <sensitive> </sensitive>, and then turn them back into { } on import.

Roger
GeneJ 2011-07-13T10:37:03-07:00
OO .. I was agreeing with you guys, just trying to help with a what I hope was a concrete example.
Christine_E 2011-07-13T10:41:37-07:00
I think there is a BetterGEDCOM element here but I may not have stated it properly. If vendors implement security features and users use them but then switch to a different program, shouldn't BetterGEDCOM be involved here?

Now I see how I started off incorrectly by saying "Genealogy programs must...". Roger said "then BetterGEDCOM says how what they've decided to share is written into a file so that other software can read it". This is the part I meant but felt I had to lead into what I meant.

I see this requirement as being similar to
ConfAcc02 - Levels of Confidence in Database Conclusions
Source01 - Information, Source and Evidence Type or even
Data0 - Char01 - recording of characteristics
GeneJ 2011-07-13T11:13:21-07:00
@Christine:

As with the example above (project ABC), if BetterGEDCOM sets out a start/stop "security" codes [see Adrian's "<SECURITY> </SECURITY> (or whatever)"], then vendors can map their various codes to same.

Are you assuming the security features would need to be different?

Separately, I didn't follow the last four lines of your above post.
Christine_E 2011-07-13T11:49:20-07:00
If program A has 3 different codes and program B has 4, or the codes in the different programs mean different things, a smooth transfer of security information can't be passed from program to program. Even if imported correctly, the new program might act differently when generating reports if the codes didn't mean the same in both programs. Possibly some programs would leave the codes to be user-defined. (I don't know enough about what existing programs do to features like these.)

The last four lines of my previous post were to show how there were other requirements for fields _about_ the data, other than the data itself. For example, in ConfAcc02 - Level of Confidence in Database Conclusions, the description says "BetterGEDCOM should allow the recording of recognized levels of confidence associated with database conclusions". Possible levels could be "certainly", "probably", "possibly", etc. I think the privacy requirement could be stated similarly with proposed/specified settings.
GeneJ 2011-07-13T12:06:41-07:00
I see.

I thought that is probably what you meant. TMG has different classifications, too. If I'm not mistaken, TMG has effectively three:

Exclusion Marker (can be overwritten by a separate command, "show excluded")
Double Exclusion Marker (can not be overridden)
Sensitivity Brackets
gthorud 2011-07-13T13:17:57-07:00
gthorud 2011-07-13T16:14:52-07:00
Thanks to Christine for bringing this important issue into the requirement catalog, it is definitely something for BG, and something we forgot that we have discussed.

I would love to be able to attach some sensitivity/exclude/private/whatever value to every bit and piece of info in a BG file, but I am not sure that is realistic. We need to find out where it is realistic to attach such a value. Some programs allow the “brackets” many places, some attach them to records.

Then there is a question of what the value means. I think we should at least try to find out what is in use, and what the contributor here wants, and go from there.

There is also the possibility of having several values.

Since most programs support such features, and if we are going to do serious work, I think we should do a survey of what programs actually do. We should not base our work on hypothetical guesswork about what some programs might do.

So,
- Where do current programs allow such markers? At record level (perhaps some examples is easier)? In any text field? In notes only? In dates? What about relations between entities?

- What do the value(s) mean? (a definition)

- What action does program take depending on the value(s)?

- Is there any relation/hierarchy between the values?

- Can they be used in combination – or do they exclude each other?

- More?

See also Tom’s list of 4 items – cited in the first post of the earlier discussion pointed to above (especially "closures").

I am not saying that this will solve the problem, but it will give us a better understanding to base decisions on. Maybe the problem is solvable?
AdrianB38 2011-07-14T14:00:25-07:00
OK - I use FamilyHistorian. It has two explicit security / confidentiality things only.

(NB - if I remember rightly, I was told there was a difference between security and confidentiality - but other than feeling that I agree, I can't explain that difference now.)

1) FH software looks for text in any note items (shared note records or notes that are a part of higher level facts or other entities) and where it sees double square brackets like this then it will (optionally) not print that text in reports. Any such text is always visible in the user interface. FH uses GEDCOM as its native file format, so the brackets are stored in the GEDCOM file.

2) FH has the concept of Flags, entered in a custom GEDCOM data item, one of which is the "Private" flag and the other is the "Living" flag.
Looks like this in the GEDCOM:
1 _FLGS
2 __LIVING Living

FH's reports etc can (optionally) exclude people with the Private flag set or just show their name and relationship to others.

The Living flag is there for more complex purposes, when you want to show some basic details for living people but not the full story. I can't actually remember if any reports pick it up or whether (as I think is the case) it's entirely up to you, the user, to do the selection work yourself in a query, etc.

The 2 flags can be used together.
AdrianB38 2011-07-14T14:19:34-07:00
I'd like to throw in a couple of points from that previous discussion and maybe some new ones as well:

1. You can insert all the flags and mark-ups that you like but one simple fact holds: your GEDCOM (or BG) output is a simple text file and is therefore WIDE OPEN to anyone to read with any file viewer (e.g. Notepad in Windows).

2. If you want to impose security on an intended recipient of a GEDCOM or BG file that you're sending out, you have one option and one option only - do NOT put the sensitive data on the file.

3. The only purpose of putting flags and mark-ups on a GEDCOM (or BG) output file that you're sending to another person is as a vague request that the recipient treats that data as confidential. It is only a vague request because the recipient may immediately remove the security flags and mark-up.

4. Even if you have security flags and mark-up, it is entirely up to the application developer what to do with them.

5. Because there is a possibility of a user transferring their OWN data between 2 genealogy applications (e.g. one does updates well, the other's best for reports), then adding security flags and mark-up to that file makes sense because it might allow the data to be processed in the two apps consistently.

In that case it would be useful to specify where on the file the flags should be. This would then allow the user to find the stuff and do some editing to swap values from 1 app's codes to the other's.

6. Repeating ad nauseum - it is no part of the BG project to dictate what apps should do. However, I'm sure a suggestion (e.g. "This marker, found here, can be set to denote people believed to be living") wouldn't come amiss and shows we've thought about it. Just don't complain if someone ignores it.
theKiwi 2011-07-12T22:13:51-07:00
Same comment as on the other discussion about web output...

I don't see that it is any business of BetterGEDCOM to dictate to genealogy software application developers what level of controls they should provide in their software to allow/help users to protect some of their data when they are exporting it to a BetterGEDCOM compliant file format.

That is between the software developer and their users to figure out. Once they've made that determination, then BetterGEDCOM says how what they've decided to share is written into a file so that other software can read it.

BetterGEDCOM may want to have a standard for how "sensitive/private" data is identified and delimited so that a compliant application can read it back in and retain that delimitation if it is exported in the first place, but even today different applications handle this differently - Reunion for Macintosh by default uses { } as those delimiters (and they can be changed to ( ) or < > or [ ] ) by the user.

Roger
ttwetmore 2011-07-13T03:41:50-07:00
Better GEDCOM has no business writing requirements for software vendors.

The Better GEDCOM requirement should only apply to our data model. We do need to support privacy settings on a per-record and on a per-attribute within record basis. Software vendors can then do whatever they like for allowing privacy, and they can use the Better GEDCOM provided settings to implement them.

There seems to be some confusion about Better GEDCOM here. We are not an industry watch dog organization. We have accepted no mandate to define what software vendors can and can't do, or even what they should or shouldn't do. Our mandate is to define a data model infrastructure that will allow software vendors now and in the future be able to get their features done by using a single data model to hold information about persons and events.
WesleyJohnston 2011-11-15T07:04:24-08:00
Data-Place03 - place can be member of several place hierarchies
The discussion that led to this focused mainly on place name changes. I want to make sure that another aspect of a place being in several place hierarchies is not lost.

Where I sit right now, I am simultaneously within the jurisdiction of multiple record-creating/keeping authorities. Certainly there are my address, city, county, state and country. But there is also a water and sewer district, a schoold district, an electric utility district, a gas utility district, a water management district. At various times, I receive mailings and pay bills to some of these districts.

A similar situation exists within churches. There are many different terms used in different denominations, but they all divide the world up into their own districts, which are important to know when you are trying to find records: parish, conference, synod, etc.

In fact, any wide-spread organization is going to have the same sort of hierarchy. And if the person I am researching was a member of the Veterans of Foreign Wars of a freemansons lodge, it behooves me to know how that fit into their structure.

So while this particular requirement originated mainly from consideration of changes of place names and boundaries over time, it can also encompass a great deal more.
ttwetmore 2012-05-31T23:01:52-07:00
Here is my proposed model for a place. It is an E-R style model though I am using the terminology of an element for an entity, so that a sub-element is the has-a relationship between entities, and references are used to represent entity to entity relationships. This translates directly to RDF triples as well. Using the words element and sub-element should make an XML, GEDCOM, JSON, etc, representation of this place model fairly obvious. This is the DeadEnds place model. I believe it meets all requirements that have been mentioned with respect to places.

A place is an element. It may be a sub-element of a higher level element (e.g., an event element), or it may be a top level element of its own. If it is a top level element it must have a unique ID to allow it to be referred to by other elements.

A place element contains sub-elements. The most important are:

name (required) – an element whose value is a comma-separated list of name parts, which can be a single name part; and

type (optional) – a comma-separated list of name part types in one-to-one correspondence with the name parts. Name part types come from a fixed vocabulary.

Other optional sub-elements include media links, latitude and longitude, the date ranges when the name was known to be in use, historical notes, source references, language of the names, and so on.

A place element may refer to a higher level place element that contains it by using a place reference sub-element. Higher level places are always top level elements. Place references include the unique ID of the top level place element. Place elements may form hierarchies by chaining places using place references. All but the first place element in a chain must be top level place elements.

Important: the term top level does not mean an element is at the top of a hierarchy; it means that the element is not a sub-element of any other element.

A place element may contain multiple place reference sub-elements, allowing places to be contained in multiple higher level places, and therefore to be members of multiple hierarchies.

Tom
ACProctor 2012-06-01T03:58:28-07:00
Re: "I also like the idea of having some form of link to other known names of the same place..."

There are many similarities between place names and personal names Alex.

  • Both entities may have alternative names over different periods of time
  • There may be spelling variations, especially over time
  • Both entities may have different names in different languages
  • The names of both entities may involve abbreviations (e.g. Thos. for Thomas, or Co. for County)
  • There may be entities with identical or similar names in the same locality (or in the same family for the case of personal names)
  • The named entity may come into being at a given date, and cease to exist at a different date

They both have a parentage too. However, the parentage of a person is fixed (i.e. their biological lineage) whereas a place may have a variable parentage (i.e. its place hierarchy).

STEMMA tries to capitalise on this so that the tokenisation of names, and the rules for matching a name against multiple alternatives, can be the same for both entity types.

The one place this falls down is in the classification of the parts of a personal name (i.e. surname, given name, middle names, prefixes, suffixes, name particles, etc). Without this classification, the sorting of names, and possibly the presentation in a formal or informal style, cannot be done for a personal name. I'm still thinking about this.

Tony
AdrianB38 2012-06-01T07:03:18-07:00
Tom,
How would you cope with
(a) multiple names over time? E.g. New Amsterdam becoming known as New York
(b) places transferring from one higher place to another?

You may be intending the multiple hierarchy concept to cover (b) but I need to ask as it's not quite the same concept.

Otherwise this looks simple and flexible.

Adrian
ttwetmore 2012-06-01T07:56:37-07:00
Tony and Adrian,

I have to ask a fundamental question. Must the BG model provide full support for the UToP ("unified theory of places"), or should it be a simple and practical system that skirts around the full complexity of the UToP?

How important is it to link, within a genealogical database, places together because they hold different names for the same real place? How important is it that our place model keeps track of place names changing over time? Where is the 80/20 breakdown in the complexity of our place model between usefulness and completeness.

I feel a constant tension between making a model too simple and making it too complex. I always occupy the simple end of the spectrum, knowing that I will always be balanced by others on the complex end.

To answer Adrian's 2 questions (and I have examples of both things in my database).

1. I generally try to use the name and geopolitical structure that existed at the time of the event. Though I am not fanatic nor consistent about it, especially when using software that doesn't appreciate the subtleties involved. So I use New Amsterdam during the right time frame, and New York during the right time frame. I have some Dutch ancestors for the New Amsterdam time frame.

2. This is kind of subset of 1. One example in my database is an ancestor who died in Brooklyn in 1881. His death certificate is from the City of Brooklyn, Kings County, New York, so I recorded his death as occurring in the place, "Brooklyn, Kings, New York, United States." Currently Brooklyn is a borough incorporated into the City of New York, so today I would refer to Brooklyn as "Brooklyn, New York, New York, United States." What is a little ironic about this situation is that Kings County still exists and it shares the same boundaries as the borough of Brooklyn. In fact the the single city of New York contains five boroughs and each borough has the same boundaries of five of New York State's counties. Yes indeedy. So if you waned to get the county name into this place you'd have to go with "Brooklyn, Kings, New York, New York, United States" with a type element of "borough, county, city, state, country". But it works.

I imagine that my approach is too simple for most of you.

Tom
ACProctor 2012-06-01T08:03:14-07:00
Re: "How important is it to link, within a genealogical database, places together because they hold different names for the same real place? How important is it that our place model keeps track of place names changing over time"

We have to do this for Persons so that we can find them given vague or informal references. I would say the same applies to Places.

As I said earlier, I think the approach to both types of named entity can be generalised to a large degree.

I thought you be keen on that Tom given your views on "generalisation" elsewhere. :-)

Tony
ttwetmore 2012-06-01T09:17:32-07:00
Tony,

As I admitted I can be inconsistent.

Handling "same-as" is easy. Add a reference sub-element:

<place id="1234">
  <name> New Amsterdam, Holland <name>
  <type> colony, country </type>
  <date> between XXXX and YYYY </date>
  <sameas id="1235"/>
</place>
 
<place id="1235">
  <name> New York, England </name>
  <type> colony, country </type>
  <date> between YYYY and ZZZZ </date>
  <sameas id="1234"/>
</place>

In this example you would likely really have four place elements, splitting out historical Holland and historical England to their own place elements. This is the tip of an iceberg. The question is how much of the iceberg do we want in the model?

Do we have a requirement that we must support these "same-as" relationships?

The argument that because we do something for people we should also do it for places is one I can't automatically agree with. Genealogy is primarily about people, and places enter in only in so far as they support persons. We are much more interested in all the alternative names that were applied to a person than we are to the names given to the places where the person lived. If we were discussing the requirements for an on-line gazetteer, then we would be discussing an application that is primarily about places and their names, so the place model would necessarily be much more complex.

The argument that we need sameas for places to enable searching in the same way that we need sameas for searching for name matches, though having some merit, would only apply in a very small minority of cases. Does any modern software support the idea? If I search for ancestors in Nova Scotia in 1783, will the software know that in 1783 New Brunswick was still part of Nova Scotia, so the area of search should be increased to cover modern New Brunswick? Is it up to the software to know these things or is it up to the researcher to know these things. I'm not suggesting the answer by the way, but it is an interesting question.

Tom
ACProctor 2012-06-01T10:39:33-07:00
I see the approach you're taking Tom. However, I have a small issue with "Genealogy is primarily about people...".

Family History is usually considered to encompass more than genealogy. I know that my thoughts are rarely mainstream ones but my own data includes some historical narrative on a few places because they were so important to the family, and this includes specific houses as well as villages or neighbourhoods. Pictures, though, would be something that most us can relate to.

Re: "Does any modern software support the idea?".

I'm a little unusual, again, because I don't use any software products other than my own. If a Place Authority existed then I'm sure the online content providers would go for it because of the advantages in providing agreed information about alternative names and hierarchies. A suggestion I wrote up somewhere was that the Place definitions held by a Place Authority can be cross-indexed with the relevant census returns, e.g. so that each street can be linked to its relevant census pages. Our National Archives came so close to having this information set up, but then abandoned it. I don't believe they ever saw the potential in this field though.

Tony
ttwetmore 2012-06-01T11:03:19-07:00
Tony,

I don't have any big push back against your points here. The differences I see between genealogy and family history are found in the types of relationships and events that each support. I think of pure genealogy as exclusively concerned with parent/child relationships, primarily the biological ones, along with the vital events of birth, death and marriage. I think of family history as broadening the bounds of pure genealogy to include interpersonal relationships of all kinds, and personal events of all kinds. I would also say that genealogy as it is generally practiced today is more than pure genealogy and may include many features of family history. I don't see how the differences between these two has a material impact on the nature of the place sub-model required to support them both. However, if all you need is the same-as relationship to get your requirements met, then I'm all for it.

For over twenty years I had exclusively used my own software, a program named LifeLines, to hold my genealogical database and to generate reports. The only thing I do differently today is that I have a family tree up on Ancestry.com. There is no easy way to keep the two systems up to date, and that is worrisome, though I don't really worry about it all that much.

Tom
AdrianB38 2012-06-01T15:32:39-07:00
Tom - how far should we go? A good question - to some extent a difficult question to answer because if I try to answer it on behalf of the FH community at large, I have to fall back on the fully complex model because only then do I have a feeling that anyone and everyone can be supported. But for me personally, taking what I'd like to do with my own relatives, I think these ideas leap out at me:

1a. Re New York / New Amsterdam - a typical question would be, list off all people named John Doe, living in the 1600s in place Y. Suppose one was recorded as born in New Amsterdam, and another in New York. (So yes, contemporary names apply in the presentation). At the moment I would need to run off 2 enquiries - one to pull off all people named John Doe, living in the 1600s in place New Amsterdam, the 2nd all people named John Doe, living in the 1600s in place New York. It would be nice to do it all in one go, i.e. a query that recognises NA and NY as the same place so returns both.

1b. My actual examples are subtly (or even not-so-subtly) different. My home town of Crewe started as a settlement within and across the 2 townships of Monks Coppenhall and Church Coppenhall. If I want to pull off a list of all people named Mary Roe born Crewe 1840 +/- 10y, I currently need to do 3 queries - 1 for each of those places. It would be nice to be able to record that both Monks Coppenhall and Church Coppenhall are subsumed into Crewe and ask the question once - presumably by asking about Crewe.

An interesting thought - any attempt to concoct a dated quasi-legal relationship between the 2 Coppenhalls and Crewe is probably counter-productive as someone might be answering the question "Where were you born?" in the 1881 question, not with the legal name as it was in 1840 when they were born, but the settlement name of 1881. So a simple synonym relationship is probably all that's needed.

2a. Your Brooklyn: The equivalent example I always use is "Widnes, Lancashire, England" (for pre-1974 events) and "Widnes, Cheshire, England" (for post-1974 events). Certainly, I'd like to be able to record the contemporary name of the place but then I'd end up with (in current software) 2 places. Ideally I want to be able to search on just one of those names but get events recorded with both the pre and post-1974 names.

So I think one important point is that I'd like to be able to record contemporary names and so have them printed appropriately in reports - but I'd like to be able to run queries that pull off places in all their forms by just one query. It may be that synonyms are all that are needed and the rest is just gold-plating...

2b. If I try to be too rigorous with place relationships, I think I'm liable to come up with names that are not recognisable by normal people. For instance, I've recorded the Antipodean city of Melbourne as "Melbourne, Victoria, Australia". But prior to 1901, Australia is a geographic expression only and Victoria is a Crown Colony in its own right, so ought to be rendered simply as "Melbourne, Victoria". And I'll bet lots of people will get confused over where that might be. Hm. So my hierarchies are sometimes a muddle as well, it seems!
ttwetmore 2012-06-02T08:43:06-07:00
Adrian,

On the how far do we go question, you definitely go further than I would.

I do want to ask something. BG is a format for archiving genealogical data. Should that data contain the historical gazetteer of the world, or should that historical knowledge be applied by custom software. For example, should you be responsible for creating a "network of historical places" to deal with the Crewe, Monks Coppenhall and Church Coppenhall" issues in your actual data, or should you expect software with an adequate place authority to be able to know the physical and temporal relationships between those places? I'd much prefer the place authority solution, but one must wonder when or if such agents will be available.

Your examples include searching when name changes are involved. Is this really a big issue for most researchers? Is it a 1% problem, a 5% problem, a 20% problem and so forth? Can it be solved by allowing an event to have more than one place hierarchy, maybe the historical one that existed at the time of the event, and the modern one as it exists today? We have an analog with person names, where we can record many different name forms for one person, which is a boon for searching.

No answers here.

Tom
AdrianB38 2012-06-02T12:57:22-07:00
Tom suggests I definitely go further than he would. Maybe - though since I'm not going anywhere at the moment, let's just call them aspirations.

Re the links between Crewe, Monks Coppenhall and Church Coppenhall. I'm firmly of the belief that I'd have to put that stuff in myself, for several different reasons:
1. As I've said before, I have no faith in the idea of a Place Authority creating data at the level of my places.
2. Even if they did, would their concept of a place match mine? Probably not as I've fudged various things, especially where I'm not sure which flavour of a place it is (as before, if someone says they were born in Barthomley, is that the village, parish or township of that name?)
3. So I'd expect to put that data in myself, for just the places I'm interested in.

Name changes - well, that's the problem, I can't really be sure on this. Anyone with ancestors in my home town will probably have the exact same problem as me. But just a few miles north is Winsford, another industrial town, with an almost exactly similar history of coming from 2 settlements, Over and Wharton. I suspect I could probably work through quite a few places where the 18th and 19th centuries created a green-field industrial site with settlement - as I'm sure happened across the globe - but in England, there tends to be an older settlement somewhere in place already.

Strictly we need to distinguish the name change (New Amsterdam / New York) from the "merge" (Over and Wharton becoming subsumed into the new Winsford). And neither strictly match the version where the higher level entries in the hierarchy change - e.g. Harper's Ferry going from Virginia to West Virginia.

But if I suspect that if my database had 3 places (Over, Wharton and Winsford), and the 3 were marked up as being equivalent, then the software could turn a search on Winsford into a search on Over, Wharton or Winsford. I think these are definitely 3 places - certainly Over and Wharton in the 1700s were 2 different places - it's only Winsford in the later 1800s that draws them together. Ditto the 2 Coppenhalls and Crewe. Whereas, Harper's Ferry, New Amsterdam / New York, Widnes are 1 place each with either different names or different hierarchies.

I'm trying to get to 80% of the functionality with 20% of the effort, so would be open to further ideas. And most of that would be to deal with searching on the alternates.
Alex-Anders 2012-06-02T15:32:01-07:00
An further example to Melbourne.

Jimbour Station was a property in Queensland. It was further recorded as Jimbour Station, County of Aubigny, Queensland. Later it was Jimbour Station, Parish of Maida Hill, County of Aubigny, Queensland and eventually Jimbour Station, Parish of Maida Hill, County of Aubigny, Queensland, Australia.
Parts of Jimbour Station were sold off and one section became a settlement known as Maida Hill, Queensland, later to be a town Maida Hill, Parish of Maida Hill, County of Aubigny, Queensland.
Maida Hill was also a suburb within Brisbane, Queensland and the Colony changed the Town Name to Bell. Later again, the suburb ceased to exist as it was encompassed by another (name eludes me).
As no Maida Hill existed, a new town (nowhere near either of the others) was named Maida Hill and exists today. A second location has also been named Maida Hill within Queensland.So an association would need to be established for some names but not others?
AdrianB38 2012-05-30T12:57:25-07:00
Tony - I have to confess that my opinions are based on / prejudiced by my belief that a meaningful Place Authority is a non-starter for the UK at least. (I can have no real opinion on the others). Certainly the current top administrative levels are do-able because they seem to be defined in various places. Let's assume that we can get back in time to some meaningful historic values - though even there I can see arguments over places like Bristol, which was a county in some sense (but not in others) for centuries but is seldom treated as such in genealogy except by Bristolians. But listing all the various towns and villages in each county (or whatever)? From what sources? The situation is even worse in Scotland where whole settlements have disappeared from the map and were never more than a few buildings around a farm. (I believe historians can draw a distinction between societies with villages and those without).

This doesn't mean the Place Authority effort isn't valuable at the higher levels.

But I still feel that the flexibility I want to see goes right up to those higher levels. I _might_ want to create a military geography - e.g. while 11 Group of the RAF in WW2 is an organisation, it's also a geographic area. Similarly, the subdivisions of British Rail were organisational but also had geographic meaning - and that hierarchy is quite different from today's hierarchy in Network Rail. As I said above, I'm not claiming these are necessarily the world's most sensible way of recording these things, but in GEDCOM I could do it. With controlled vocabularies for place-types (never mind Place Authorities), I don't see how I could, without there being an all embracing hierarchy of business-level1, business-level2, etc, and user defined "natures" as you suggest, all the way up the tree. Which seems to make it pointless.

It's rather like event types - we could come up with a core list of event types or place types, but then users need to extend it individually in a meaningful fashion.
WesleyJohnston 2012-05-30T19:17:28-07:00
I have long since given up trying to assign historically correct place names. What I really want is some single way of identifying a place and not carrying around all the baggage of whether it was Canada West or Upper Canada or Ontario.

Even after the names changed, the actual use in the records often still was the old - then-obsolete - way, so that if you are being literally accurate about recording the place as it was written in the record then you would be historically inaccurate about the fact that it was or was not Upper Canada instead of Canada West.

So as I said, I have simply opted to come up with a single time-independent hierarchy that is wrong for a lot of time periods but which if I really want to know what the correct form was then I can do that separately. When I am entering a place for an event, all I really care about is that it was at Whitby or Manchester. If I have a disambiguation problem, then I need to know a bit more about the higher levels, but only to distinguish which place I am talking about and not what the higher levels were called at any give date.

So the entire notion of a time-correct place hierarchy is something that I have simply rejected as being counter-proudctive. Any time that I want to know what the higher levels were for that place (which is really not very often), then I can easily look that up.

Even for towns whose names have changed, I have simply chosen a single representation: English Corners became Columbus, and I choose to use Columbus for all references in my database - again simply to say where it happened, so that that fact is known ... any broader statement of the location is unnecessary in most cases.
WesleyJohnston 2012-05-30T19:20:10-07:00
"So the entire notion of a time-correct place hierarchy is something that I have simply rejected as being counter-proudctive."

Don't see a way to edit this, but what I meant was

"So the entire notion of a time-correct place hierarchy within the event specification is something that I have simply rejected as being counter-proudctive."
Alex-Anders 2012-05-30T19:30:03-07:00
What will you do to your data if/when a current place is renamed? Global replace and use new, keep current not use new????
ttwetmore 2012-05-30T19:56:19-07:00
Wesley,

I wouldn't say I agree with you 100%, but you have sure hit the simplicity and flexibility nail on the head. Whatever we agree on it must be able to support exactly how you want to record your places. But it must also be able to support researchers who want to use historically accurate place names. Of course I taut the DeadEnds approach for places that I gave some very recent examples of. It can handle all of these cases simply and easily.

Tom
ttwetmore 2012-05-30T20:02:15-07:00
"What will you do to your data if/when a current place is renamed? Global replace and use new, keep current not use new????"

Change them if you like. Leave them as they are if you like. I don't see a worrisome issue here.

Tom
WesleyJohnston 2012-05-30T20:12:08-07:00
"What will you do to your data if/when a current place is renamed? Global replace and use new, keep current not use new????"

Change them if you like. Leave them as they are if you like. I don't see a worrisome issue here.

Precisely ... Bohemia was separate, then became part of Austria, was in Czecholsovakia, was virtually annexed by the Third Reich, is now the Czech Republic. But I have opted to use Czech Republic and would stick with that if it ever changed again. If I did opt to change it, I would probably simply go back to using Bohemia.
ttwetmore 2012-05-30T20:15:35-07:00
Tony said,

Regarding user-defined tags, here's a line of thinking I'm currently in the middle of...

STEMMA tries to define a "controlled vocabulary" for the Place types - all the way from country down to building. This effectively means a closed set of terms. Although I don't yet have a sufficiently complete set of terms, the reasoning was in order to support a Place Authority. Such an authority - especially if federated as recommended on my Web page - must have a controlled vocabulary in order for the parts to work as a single resource. That vocabulary, in turn, must be a super set of ISO 3166-2 and the European NUTS which are only relevant to present-day entities.

However, I'm aware that there is still a need for user-defined terms. Rather than in the elements of a geographical/administrative Place Hierarchy, I think these might be for the "nature" of the Place, e.g. school, household, hospital, cemetery, church, etc.

What do you think?


I think you are on the right track. I don't agree that the vocabulary must be a superset of ISO & NUTS, but it should contain a sufficiently complete set of terms. Note that we can put in some pretty loosey goosey terms like levelOne, levelTwo, and so on, with guidelines on the area and population criteria to apply when using them. I assume we would support localization to multiple languages.

I don't think it is that hard to create that set of terms. And realize, from my examples, and others things I have written, that I don't believe it is mandatory, nor even highly recommended, to use these terms when writing place names. In the vast majority of the cases, a place authority could take a look at exactly what we have chose to use for places, without us supplying the terms, and figure out exactly what places we mean.

Tom
ACProctor 2012-05-31T01:10:13-07:00
Re: "I have long since given up trying to assign historically correct place names. What I really want is some single way of identifying a place and not carrying around all the baggage of whether it was Canada West or Upper Canada or Ontario"

Time-dependent hierarchies are working for me Wesley (converting my data to STEMMA as part of my research).

Once a Place entity has been created, it very easy to reference it from elsewhere in the data. I have started adding historical narrative to those Places, and images, and time-dependent hierarchies that aren't relevant (yet) to my data, in order to flesh-out the Place. This doesn't affect my usage of the Place entity though.

Tony
WesleyJohnston 2012-05-31T08:23:39-07:00
The problem for me is that I am using Ancestry.com's online trees as my master copies. A few years ago, I came close to finding out the second date on my own grave stone and realized that it was more important to share what I have than to keep it on my own computer. And I decided on Ancestry as the place to share it, since they claim that your tree will survive your membership. That means living within the significant limitations of Ancestry's online tree software, and they are really weak on places -- even their Family Tree Maker does a better job of place handling. But that's the tradeoff that I have made. If you see some way for me to still tap STEMMA, I would be interested.
NeilJohnParker 2012-05-31T09:02:09-07:00
Gentlemen, I believe we have reached (passed) the point of diminishing returns on this issue. It is obvious to me, and I hope others, that the standard must support Place Names "as found" on a citation or as otherwise determined by a user and should also give those users who whish to have their Place Names edited against one or more Hierarchial Temporal Place Name Structures the option to do so. Furthermore, this dual option rule must be generalized as it applies to several other fields, e.g. Dates and Personal Names.
Alex-Anders 2012-05-31T13:46:58-07:00
I like the concept of allowing any variation, as Neil has said. I also like the idea of having some form of link to other known names of the same place, and then being able to search on any and display them all.
ttwetmore 2012-05-29T10:34:38-07:00
Adrian,

Thanks for your comments.

Do you think it would be possible to invent tags (e.g., like I'm using city, county, state, country), to account for all the types that can get related together in place hierarchies? If we could then everything can be specified via the <type> approach. I'm worried that there might be too many things to have types for, especially when taking historical regions into account.

Start Aside.
In the U.S., counties have a fascinating history, as many (most?) have split over time as western settlement evolved. Another interesting case in my research occurred in Nova Scotia (then a colony of England, not a province of Canada) after the exile of the American revolutionary war "loyalists" to the Saint John River region of Nova Scotia. Within a year the exiles petitioned to carve a new colony out of Nova Scotia, which occurred, giving rise to the colony of New Brunswick. When my loyalist ancestors were exiled they were exiled to the colony of Nova Scotia, but after a year they were living, without moving, in the colony of New Brunswick. The typical way of handling this kind of situation is to use the place hierarchy as it exists today, not how it existed then, though that can sure leave a lousy feeling in your stomach. And of course, when all of these exiled ancestors were born, the United States didn't exist. Those that were born in New York were born in the colony of New York. Do you worry about this in your database? I sometimes do, sometimes don't. As I said my goals are simplicity and flexibility; note that this does not include consistency!! Question: did the United States come into being on July 4, 1776, when the declaration of independence was signed, or was it on September 3, 1783 as the result of the signing of the treaty of Paris?
End Aside.

Of course, the simplicity side of me has to ask, why does it matter that we know the exact kind of a hierarchies we are dealing with? Don't we usually just want something that makes sense when we show a place on a user interface screen or print it into a report? As long as there are no extra commas and it makes sense, isn't that enough?

I can imagine you would like to know more details if you were doing some kind of demographic statistics with the information, but how important a goal is that for genealogical software?

Tom
ACProctor 2012-05-29T12:14:14-07:00
Re: "Do you think it would be possible to invent tags (e.g., like I'm using city, county, state, country)"

It would be easier to handle using a controlled vocabulary for Place-types rather than specific tags Tom. The reason is that there are a lot of possible terms, but there is no consistency across the world - either in the terms being used, or their relative ordering. For instance, these are only a few of the more common terms for administrative divisions within a national boundary:

Authorities, Boroughs, Counties, Departments, Dependencies, Districts, Islands, Municipalities, Parish (Civil), Provinces, Regions, Republics, States, Territories

Even this doesn't include things like sovereign state, or crown dependencies, both of which are relevant to the UK (which isn't a country) and its Channel Islands (which aren't technically part of the UK).

Attempting to give them all a distinct tag name ties the model too closely to peculiarities of national administration. It might also limit the types of hierarchy, or the height and depth of the hierarchy.

Both the ISO 3166-2 and the European NUTS standard only define the present-day national subdivisions so things like "Shires" are not considered.

Tony
ACProctor 2012-05-29T12:17:13-07:00
...Apologies if I misunderstood Tom - I assumed your reference to "tags" is for record tags rather than the value of a <Type> element.

Tony
ttwetmore 2012-05-29T15:25:50-07:00
Tony,

I think you did misunderstand, but you've rectified the situation.

But your question does prompt me again, wearing my simplicity and flexibility hat, to ask whether or not the type "words" (e.g., city, county, state, province, ocean, country, ...) have to come from a fixed set or not. Why couldn't they be anything a user desired? I assume common sense would prevail and there would be a very standard and obvious set, and software would provide all the obvious ones in a user interface widget, but why not let the user use any word or no word? Where is the value in prescribing a vocabulary for the types of geographical and political "units"? I came up with statistical analysis, finding out how many people in your database come from each country, say, but that's about it. And since in most cases you will have this information in the data, there's no problem. There will always be outliers in your data where you don't have enough info to let them be part of this kind a analyses.

I hope you are not getting fed up with my constant desire to question all assumptions that creep in in the guise of rules and so forth. For example, every once in awhile someone jumps on the latitude and longitude wagon, expressing the belief that lati-longs are the ultimate solution to the place problem -- if you can tie a place to a specific spot on the earth everything else is just gravy. I don't have a single lati-long in my database and am not missing a thing.

Where it is obvious, and where you really care to do so, I think it is great for a researcher to specify the details on places (by using those type words or some other way we settle on). What I want to warn against, however, is that discovering a good way to do great things, should never be used as a justification for enforcing that great way of doing things with a rule. Genealogical data does not do well in the face of the kinds of restrictive formatting rules you find in typical database schemas. Just consider GEDCOM.

I want to be able to say "New London" as a place when I don't know what it is or where it is, without any penalty for not knowing. When I find out what it is and where it is, then I want to be able to add that info. If New London is a place associated with someone way out on the periphery of the persons I am interested in, I'm not likely to care whether I ever resolve "New London" any further.

Tom
WesleyJohnston 2012-05-30T03:49:54-07:00
All of the examples thus far given use place only down to the level of a city. Keep in mind that we also have Data-Place06 "Location to include address". So we have to be thinking of locations as going deeper than city.
NeilJohnParker 2012-05-30T05:29:26-07:00
I am sure there are numerous uses for Tag information, how important any of them are is open for discussion. The answer being probably not that important.

What is most important in my mind is that the field that you are recording is at least correct for the date that you are using. I have often recorded a field that was incorrect, being notified by the system (in this case Legacy Deluxe that it was invalid for the date and after further checking found the correct value. This feature is the main strength of the editing against a temporal place hierarchy for me.

Neil Parker
NeilJohnParker 2012-05-30T05:31:39-07:00
Two points:
Place Name is a temporal attribute, and the hierarchy must allow a date range to be associated with each place name; (each time a date is looked up in the hierarchy, the date must accompany it)

There can be several hierarchies, i.e. geographic, government, religious - Roman Catholic, Religious - Anglican etc .

Must have ability to select different hierarchies.

Temporal Place hierarchies may be available in several public or private authorities on the internet , if so should have ability to access them.

Place name can have a place name type which may be several alternatives depending on which level you are at i.e. in Ontario Canada it is County, District or Region for level 3. This can all be accommodated in the place Hierarchy but most of the time it is never used.

Also if the Hierarchy is not governmental should it be explicitly stated.

Can the user add on to the hierarch. Usually hierarchies only go to the official level , for Governments in North America lets say:
Country
Province or State
County, District or Region
City, Town, Village, Hamlet, Unorganized Territory, etc

Should the user be allowed to add additional levels such as community Name

Neil Parker
AdrianB38 2012-05-30T07:09:31-07:00
Tom asked "Do you think it would be possible to invent tags ... to account for all the types that can get related together in place hierarchies?"
In practice, I believe not. It might well be possible to obtain the current hierarchies for governmental administration, but (a) historic hierarchies are more likely to be a problem and (b) the types of hierarchy are probably infinite.

To take (b) - I might very well have described someone's occupation as "motive power superintendent", organisation = "LMS Railway", place = "North Wales Division". I'm not saying I would, nor that it's the only way, never mind the best, but the point is that I might have done something like that and so this data would need to be converted from my GEDCOM-style business-division-places to BG. Chances of anyone thinking up all such hierarchies are slim and therefore having only a controlled vocabulary seems implausible.

To take (a) - similar issues apply. If I'm lucky enough to trace my ancestry back to Anglo-Saxon times, are we sure that we could define all the Anglo-Saxon place-type hierarchies? (And every time you say 'yes', I'll just find a more remote country!) Again we have the case that if I am wanting to define a hierarchy unique to the Kingdom of Mercia, say, and it's not in the controlled vocabulary, who's going to validate my request to add it? How many experts in Anglo-Saxon genealogy are there? Will they be in the BG administration?

Seems to me that while we might want someone to provide "best practice" hierarchies (e.g. building-address, suburb, settlement, county, country), we're on a loser if we expect those to be the only choices.

And while Neil and Tom are right to highlight the temporal element, it's also right to note that UK family history at least is consistently inconsistent in whether or not it expects to use a contemporary name - we have a tendency to stick to the pre-1974 county names even for post-1974 events. To some extent this is possibly because the software doesn't search well across time - "Manchester, Lancashire, England" is different from "Manchester, Greater Manchester, England" in many systems, whereas if there were an underlying entity of "Manchester" with timed-hierarchies and timed-names, then there'd be less of an issue in searching for events in Manchester. But even then, I suspect that "Manchester, Greater Manchester, England" is such an unlovely name that most of us would prefer the old shire county version, regardless of any authority. I think Tom at least would also prefer to choose his own names...
AdrianB38 2012-05-30T07:19:58-07:00
I think it may well be useful to remind ourselves why we might want hierarchies, particularly changing ones.

I suggest it's
(a) to encourage best-practice designation of locations;
(b) to enable locations that match across people's databases;
(c) to record what happened as a matter of local interest - e.g. "lived in 3 different countries and never moved a mile";

Just because we can do complex stuff, doesn't mean we should. Which I think is what Tom suggests.
ttwetmore 2012-05-30T09:23:51-07:00
I have no problems allowing great complexity in the places area. Adding temporal information to place names makes sense to me. Having official hierarchies available to check against makes sense. Having software do additional sanity checks using places and dates also makes sense to me. Having software suggest corrections to places makes great sense, and I use that feature in existing software. Having multiple hierarchies categorized into different types makes sense.

But (and there is always a but) I don't believe in requiring that complexity. If I want to enter a location of "New London, Connecticut", with no additional information about what the entities are, or what historical time frame is involved, I insist on being able to do it.

Genealogical data is inherently messy and ambiguous and confusing and unclear and erroneous with room for conjectures and speculation. The places area must be designed to handle this kind of data. The world of genealogical data is simply not arranged in neat little bundles of cities, counties, states and countries. Maybe you had a relative who died "out west". I'd want to record that as <place> out west </place> with no rule-bound software monitoring me. I'd be happy if this showed up on a warning log, because I would like to pin it down a bit better later, but I'd insist that it be allowed.

Tom
AdrianB38 2012-05-30T09:45:19-07:00
Tom - I agree. I think.

I need flexibility when I get references to "Manchester", e.g. Is that Manchester the parish, the township (within the parish), the settlement? It may not be clear. But what is clear is that it's "Manchester, Lancashire, England" since all 3 concepts sit inside the county in question. That hierarchy either doesn't exist or is "Unknown, county, country".

I don't _think_ anyone here is seriously advocating validating such stuff out but it does need a bit of thought as it may be the rule not the exception for many similar places.

(Don't get me started on what "London" means...)
ACProctor 2012-05-30T10:51:55-07:00
Regarding user-defined tags, here's a line of thinking I'm currently in the middle of...

STEMMA tries to define a "controlled vocabulary" for the Place types - all the way from country down to building. This effectively means a closed set of terms. Although I don't yet have a sufficiently complete set of terms, the reasoning was in order to support a Place Authority. Such an authority - especially if federated as recommended on my Web page - must have a controlled vocabulary in order for the parts to work as a single resource. That vocabulary, in turn, must be a super set of ISO 3166-2 and the European NUTS which are only relevant to present-day entities.

However, I'm aware that there is still a need for user-defined terms. Rather than in the elements of a geographical/administrative Place Hierarchy, I think these might be for the "nature" of the Place, e.g. school, household, hospital, cemetery, church, etc.

What do you think?

Tony
gthorud 2011-11-15T17:33:57-08:00
Ages ago I proposed 5 definitoins related to places. See under P in this list

http://bettergedcom.wikispaces.com/Glossary+Of+Terms

In some cases the authority is implied by what I have called the place type.

I think there are cases where you would want to know the authority, for example when a place is assigned an identifier (a type of name) in some identification system - the id may be needed since names are often not unique within the next higher level. It may be useful to record the authority even if you do not reflect it in the hierarchy, e.g. for looking up a name/id in a database operated by the authority.

Legacy does not cover the whole world at all times - far from it. The user must be able to tailor this to his own needs.
AdrianB38 2011-11-16T08:03:42-08:00
Rules for hierarchies are fine where they fit, but we need to design for both rule-based hierarchies and the sort of ad hoc muddle that the UK has. Some issues are:
- it may not be clear just which place-definition is referred to by a name in a source. For instance, when I see a residence of Barthomley, is that Barthomley the parish, Barthomley the township (a sub-division of the parish) or Barthomley the village? All 3 are centred on the church of St. Bertoline but it's seldom clear which is referred to;
- certain settlements overlap boundaries - in the UK, London is split between the historic counties of (at least) Middlesex and Surrey, depending on which side of the Thames you are. Hence I usually just write "London, England" even though I normally ensure that a county is 2nd element. (Similarly, does Kansas City regard itself as 1 city or 2?)
- it is usually easy to acquire current political hierarchies for some countries. But they may not be appropriate - to avoid changing hierarchies, UK genealogists usually use the "historical county" as the 2nd node of 3. The definitions of these is also not always clear, particularly as some cities were, for local government purposes, split out of their surrounding county many yeas ago - much earlier than simplistic traditionalists acknowledge.
- UK genealogists tend also to mix hierarchies - for instance, the "Coppenhall parish" that I know is usually qualified as "Coppenhall parish, Cheshire, England" - which is an ecclesiastical (CofE) / political / political hierarchy as the wholly ecclesiastical one would read "Coppenhall parish, Diocese of Chester, England(?)", which is less help, especially when referring to Dioceses whose geography is not clear - e.g. the Diocese of Lichfield.

So yes, take advantage of rules where you can, but allow the rest of us to fiddle and tweak as we always have!
ACProctor 2011-11-27T09:28:30-08:00
Another way of looking at this, which I feel might be a bit simpler and easier to manage, is to keep place name hierarchies as pure hierarchies. If a place changes from one parent to another then a separate hierarchy can be created and a weak link added between the leaf elements to indicate that they're effectively the same place.

Advantages include the ability to reference a Place directly since it only has one hierarchy above it. Applicable dates could be associated with the two variations of a place.

I know it's just as esay to split a Place definition to say, for instance, 'between x and y it is in this hierarchy, and between x2 and y2 it is in this one'. However, if a building changes its usage (e.g. a school becoming a factory) then the idea of an equivalence link works very well, even though no hierarchy has changed.

There's really not a lot of difference between the two approaches. I'm looking for a good argument to favour one over the other. Any suggestions? What about handling name changes, as opposed to variations of spelling?
testuser42 2011-11-27T14:54:35-08:00
Hi all,
I've posted this a long time ago already, but it might be relevant here.

There is a great resource for places in Germany and Middle Europe called "GOV", translated as "The Historic Gazeteer". Most pages are in German only, but Google translator might be doing an OK job:
Overview:
http://wiki-de.genealogy.net/GOV
"As of Feb 2011, there are about 355000 objects in the database" (see the red dots on the map).
"GOV collects historic and current information about places, churches, regional structures, political and church affiliation, statistical information etc."
Use the search to see for yourself:
http://gov.genealogy.net/search
You'll get a list of results, and then a page for the object. Here's the city Straßburg/Strasbourg
http://gov.genealogy.net/item/show/STRURGJN38VN

The data model is straightforward, as far as I understand it:
http://wiki-de.genealogy.net/GOV/Datenmodell
A GOV-Object has many properties
http://wiki-de.genealogy.net/GOV/Quicktext#Eigenschaften_von_Objekten
is called x (with date range and language code)
is situated at position x (coordinates)
is object type x (date range)
has x inhabitants (d.r.)
has postcode x (d.r.)
has w-Number x (thats a special kind of postcode)
has external ID System:ID
has confession x
has URL x

...and relationships to other objects
http://wiki-de.genealogy.net/GOV/Quicktext#Beziehungen_zwischen_Objekten
belongs to y
is situated in y
represents y (e.g. a church building represents the parish)

Both properties and relationships have the possibility of a date-range and a source.
Object types are numerous:
http://wiki-de.genealogy.net/GOV/Objekttypen

To allow the transfer of such place hierarchies, a group of developers agreed on an extension to GEDCOM.
http://wiki-de.genealogy.net/Gedcom_5.5EL
http://wiki-en.genealogy.net/Gedcom_5.5EL

It seems that while they were at it, they also started to clarify and extend other GEDCOM tags, like MARR and NAME. This might have been the start of the ongoing effort in the "GEDCOM-L" mailing list.
The "GOV" has its own list, it seems:
http://list.genealogy.net/mm/listinfo/gov-develop
WesleyJohnston 2011-11-27T18:22:54-08:00
It certainly does give an idea of the way in which even a small village can be in multiple very different hierarchies. I plugged in SPANTEKOW into http://gov.genealogy.net/search and the four hierarchies that result radically different, even though they are all for the same village (there is only one Spantekow in Germany) -- not just in content but in complexity of the hierarchy.
AdrianB38 2011-11-28T09:35:48-08:00
"the way in which even a small village can be in multiple very different hierarchies"

I'm not sure that's the way I'd describe it. The way I read the first screen, there are 4 _different_ objects called Spantekow. Clearly all have the same name, all derive that name from the same village, but they are different objects:
- a Gutsbezirk (oh joy, Google Translate doesn't help, but it's defunct)
- an Amt, which seems to be an administrative division - also now defunct,
- a village (which appears to be in the municipality of the same name)
- a municipality

That's a bit of an over-simplification - the "village", for instance, is actually referred to as a village, Ortschaft or Ortsteil at various times, but seems to be the same object throughout.

So this, in my view (fools rush in, etc...), seems to represent 4 different objects with the same name, not 1 object in 4 different hierarchies.

Having said that, the municipality of Spantekow, when I look at the "Superordinate objects" diagram, does seem to have a pair of current "owners", i.e. it's in 2 hierarchies, viz: it's currently owned by both the Amt of Anklam-Land and the Landkreis (rural county?) of Vorpommern-Greifswald. Though since Anklam-Land comes under Vorpommern-Greifswald itself, I'm not sure why the direct relationship is there.

So in summary, I think we do see a place under 2 different hierarchies, but we also need not to confuse ourselves by equating different objects of the same name (in the same geographic area). That's a different topic.
ACProctor 2011-11-28T09:49:22-08:00
I struggled for a while on this subject, and I'm still not certain I "have it".

Geographical hierarchies are much easier to handle - even though they're not always unique. Names change, multiple spellings, boundaries moving, etc., can be handled fairly cleanly.

However, administrative, council, parish, electoral wards, etc., feel independent to me. Rather than creating hierarchies of different flavours, I decided to just keep the geographic one(s) and represent the other details as properties, or associated PFACT items :-)
AdrianB38 2011-11-28T12:12:18-08:00
I think in practice, Tony, you're close to practical reality.

Firstly, let me reiterate that I believe places _can_ be in multiple hierarchies at once - e.g. in the late 1800s, Haslington was in:
- the Church of England parish of Haslington;
- the geographic county of Cheshire;
- the parliamentary constituency of Crewe;
- the Poor Law Union of Nantwich
and no doubt others beside.

However (1) - How many of those place hierarchies are relevant to family history? Sure, we may mention the Poor Law Union and the parliamentary constituency, and we certainly need to understand where we might find Haslington's records filed for parliamentary elections or Poor Law purposes - but do we need a database structured to hold all that lot, or are the simple notes on the GENUKI web-site sufficient?

What we _do_ surely need is the hierarchy that locates the place on the map because we might want to query by village (Haslington) or county (Cheshire). And I'm also quite sure the ecclesiastical hierarchy will be similarly useful because so many of the records come from ecclesiastical sources that querying by parish is important.

However (2) - I wonder if some of these place-place hierarchies are not actually place-organisation hierarchies - or can be, in practice, treated as such. For instance, while Cheshire County Council can be represented on a map, and therefore has a nature as a place, nonetheless its only role in my files is as my first employer, and it is therefore an organisation in my view. Wilmslow, the place where I worked, was administered by CCC, so therefore I could happily record an independent place (Wilmslow) to organisation (CCC) hierarchy, much as Tony suggests.

I reckon that between ignoring the hierarchies of no direct relevance to the genealogy of people and turning other hierarchies into place-organisation ones, we probably - at least in the UK - slash the number of multiple hierarchies drastically.

The question then becomes whether the number of multiple place-place hierarchies across the world is such that it must be represented in BG or not?
ACProctor 2011-11-28T13:31:44-08:00
I think that from a date point of view, the answer is probably 'yes'. In other words, different hierarchies could be valid over different dates.

Would you treat that as subdivisions of a Place, or distinct Places with a soft link between them?
ttwetmore 2012-05-29T08:42:28-07:00
A place is primarily an attribute attached to an event or other attribute. In this context a place should be able to appear in two main contexts (a combined context to be explored also below).

First, the place can be encoded in situ, that is, self contained, with no external links, for example:

<person...>
  <birth>
    <place>
      <name> New London, New London, Connecticut, United States </place>
      <type> city, county, state, country </type>
    </place>
    ....
  </birth>
  ...
</person>

Because a place is an attribute, it can have any of the sub-attributes that any attribute can have. Here I decided to use the <name> attribute for the place’s name and <type> for the types of the name’s parts. No implication that <type> is required.

Second, the place can be encoded as a reference to a “first-class” place object/record, for example:

...
  <birth>
    <placeref id=”p12345”/>
  </birth>
...
<place id=”p12345”>
  <name>New London, New London, Connecticut, United States</name>
  <type> city, county, state, country </type>
</place>

In this case there is a single place record for the city of New London which contains its full hierarchy up to the country level. But this can be expanded to using a full hierarchical approach, for example:

<place id=”12345”>
  <name> New London </name>
  <type> city </type>
  <placeref id=”p12346”/>
</place>
...
<place id=”p12346”>
  <name> New London </name>
  <type> county </type>
  <placeref id=”p12347”>
</place>
...
<place id=”p12347”>
  <name> Connecticut </name>
  <type> state </type>
  <placeref id=”12348”/>
</place>
...
<place id=”p12348>
  <name> United States <name>
  <type> country </type>
</place>

And hybrid approaches work just as well:

...
  <birth>
    <place>
      <name> New London, New London </name>
      <type> city, county </type>
      <placeref id=”p12349”/>
    </place>
    ...
  </birth>
...
<place id=12349”>
  <name> Connecticut, United States </name>
  <type> state, country </type>
<place>

These examples show my preferred model for places. It allows parts to be fully combined or fully separated into hieararchies or any combination in between. It allows parts to be encoded entirely as a simple attribute or entirely as as place objects/records or any combination in between.

Note that postal addresses can composed of additional place parts if so desired. The model allows latitudes and longitudes to be added as properties at any level in the hierarchy.

Also note that this model allows multiple hierarchies, as in any context where a <placeref> can occur, multiple <placeref>’s can occur. So this model can handle pure geographical containment, pure political containment, containment based on historical boundaries, and any multiple combinations thereof.

And it does all these things while being itself nearly trivial in structure, definition and implementation. Note that the <placeref> elements could be replaced by <place> elements to simplify it even further.

This is the DeadEnds model of places. My top four design princliples are simplicity, flexibility, simplicity and flexibility, in no particular order.

Tom

ps. This also works with external place authorities. DeadEnds assumes that all record level objects have unique IDs. Therefore any place authority that provides hierarchies of place records with unique IDs assigned by the third party authority organization integrates seamlessly. Therefore the id in a placeref (e.g., <placeref id=”xxxxxxxxxxx”/> would point to a place object maintained by that authority.
AdrianB38 2012-05-29T10:12:17-07:00
"note that this model allows multiple hierarchies, as in any context where a <placeref> can occur, multiple <placeref>’s can occur."
In this case, you need to "type" the hierarchy _somewhere_ to show that this is the geographic higher place or this is the ecclesiastical higher place. Would that be best done at the 'lower' end or the 'higher'?

As it has to be present at the higher end anyway to show what sort of place it is (I assume that if you are having multiple hierarchies then you would indeed type all your places otherwise life would get too confusing), then it would seem sensible to "type" the hierarchy by saying - look at the 'higher' end to find out what sort of hierarchy it is. But that does imply that if we have X related to Y, then there is only one way that X can be related to Y, and that way is determined by the types of X and Y. I am slightly worried that this may not be so - that German site had some strange (to my eyes) relationships - was it true that X could be related to Y in two different ways? Or was it that X to Y existed at the same time as X to Z to Y, which was for a different purpose????? Dunno...
AdrianB38 2012-05-29T10:21:01-07:00
Flipping heck - why don't I read my own posts! The example is right above:

"the municipality of Spantekow, when I look at the "Superordinate objects" diagram, does seem to have a pair of current "owners", i.e. it's in 2 hierarchies, viz: it's currently owned by both the Amt of Anklam-Land and the Landkreis (rural county?) of Vorpommern-Greifswald. Though since Anklam-Land comes under Vorpommern-Greifswald itself, I'm not sure why the direct relationship is there."

Now I read it, this doesn't seem an issue after all. X is related to Y. X is also related to Z. Y and Z are different types, so that can be used to define what the type of relationship is that X is in. Y is also related to Z but this doesn't seem to be a problem.

So, we still have the query - if we have X related to Y, then is it true that there is only one way that X can be related to Y, and that way is determined by the types of X and Y?

(Bear in mind that name=Nantwich, type=Rural-District is not the same place as name=Nantwich, type=CofEParish)
ttwetmore 2011-11-15T09:12:05-08:00
I agree with Wesley on this point. The Place hierarchy is in actuality a directed acyclic graph and not a simple tree. Implementation is still relatively simple. Each place can refer to one or more places that enclose it, and simple graph algorithms can assure the DAG property after each new Place is added to the hierarchy. I was going to bring up this point in my recent post on the Place thread but thought it would add more confusion than not.
gthorud 2011-11-15T15:39:10-08:00
So a place record can have several parents. And there can be several paths from the top to a record.

Then you also need to be able to record a path through the graph, i.e. the path that a is specified in e.g. a source for an event. You can not simply refer to a place record in an event.
NeilJohnParker 2011-11-15T15:59:52-08:00
There may be some additional properties that a place hierarchy has. For example, does a place hierarchy require or imply that some authority has resposibiity for determining what the place names are and their boundaries are at any given time and has the ability to change the boundaries at any time. This is certainly true when it comes to traditional country, state (or province, territory...), county (or distict...), municipality (e.g.city, town, village, hamlet, township...) Furthermore usually the boundaries for such subdivision must be wholly contained in its parent and not overlapping with its sibling. Furthermore we are really talking aobut place name/place type duets, i.e New York/City, New York/County, New York/State, US/Country. The jurisdiction (i.e. place hierarchy should contain place type in its records but it is usually not explicitly shown in listings. I suggest that each jurisdiction be it political, administrative (e.g. utility Co., Census...), religious etc must be defined along with the authority for that jurisdiction i.e. Church of England parish place hierarchy. Also I believe that this is a multiple hieracial data structure (i.e. one distict hierarchial structure for each jurisdiction, not a network structure. A jurisdiction use of a given place may or may not be coterminau with those used by another jurisdiction weven though both use the same name.

NeilJohnParker@Telus.net
NeilJohnParker 2011-11-15T16:28:38-08:00
Follow on:
If place hierachy is to be repreesented as multiple place hierachies, then this place hierachy must be cabable of being created and maintained by a user and/or some other jurisdiction, preferably the juridsdiciton that owns it, e.g. Church of England.

It is assummed that goood software package would at least contain the default jurisdiction political i.e. Country, State, County, and municipality for the world with its temporal data (as does Millenia's Legacy Deluxe Edition).
WesleyJohnston 2011-11-15T17:06:52-08:00
NielJohnParker wrote " ... as does Millenia's Legacy Deluxe Edition"

I'm glad you mentioned that. I have been thinking about it a lot. Millenia/Legacy has done a great service by making that information about locations available. If I remember correctly from a 2009 presentation, Paul Rasmussen has also been supporting a software program that maps addresses within some cities, so that you can see where they were at different times as the ward boundaries changed.

There is a great need for a standardized historical database of political hierarchies, from the address level on up. And Millenia/Legacy has made a significant stride in that direction.
WesleyJohnston 2011-11-26T13:58:53-08:00
Data-Place06 - Location to include address
The 4-level limitation of the existing actual use of GEDCOM places has meant that the preparation of archival resources for online searching has been deemed sufficiently done when you can find the aggregated documents for a specific city or township or parish or whatever the lowest of the 4 levels is deemed to be for that jurisdiction.

I very much hope that the way in which Ancestry, FamilySearch or anyone else who publishes source images after BetterGEDCOM goes back to their records and adds information such as address to the way in which the records can be searched.

Here is an example that I have had to do far too often. I am searching for a family of a nationality/ethnicity that uses names that were not at all the norm for the census takers (or other record writers) of the area where they lived. This is further compounded by indexers who later tried to read the hand-writing of those record writers and bring their own cultural biases to bear. The result is often a butchery of the indexed name beyond even the most creative search variants (even using # and ?), so that the family that you know must have been there is not showing up in the indexes.

We need true search by place, down to the address level. When I have found this family living at 774 Calumet Street in the 1910 and the 1930 censuses but cannot find them in the 1920 census indexes, I want to see what the 1920 census showed for 774 Calumet Street.

This is of course an aspect of BetterGEDCOM that goes far beyond the narrow vision of a person sitting at their computer and using BetterGEDCOM to store and share their information. It goes to the heart of the massive online resources that are now being built up.

I hope that this aspect is not lost in the vision of how the ideal world of the future will look after BetterGEDCOM is a reality.
ACProctor 2012-05-27T00:50:29-07:00
Transferred from the 'Attribution' discussion to keep things on-track...


ACProctor 2010-05-26 09:53 am

Re: "I would be interested on your take on the difference between place and location."

I provided a link to this in the first post of this thread Tom: Places under section 3, 'Place Names'. I wanted to eliminate the confusion over the three concepts: Place, Location, and Postal Address since there are some big differences.

In those same research notes, section 3.1, 'Hierarchies', discusses Place Hierarchies, and in particular how the 'geographic' approach needs to be supplemented with 'administrative' divisions within the national borders of a country. It also tries to separate elements such as religious, jurisdictional, and political properties in the hierarchy. STEMMA may be unusual in taking the Place Hierarchy concept down to street and building level but it's generated some surprising insights in my experimentations here (using my own family history data).

Related to this is 5.1, 'Place Authority', which strongly recommends the provision of a federated Place Authority, and how some resources in England & Wales are almost there but are being squandered though lack of vision.


ACProctor 2010-05-26 10:00 am

Re: "In the United States we tend to think of places as being made up of three "standard" levels: city, county and state"

I have to say - as a foreigner - that it breaks down in the US too. Washington DC ("District of Columbia") is a federal district and not part of a state. :-))


AdrianB38 2010-05-26 4:58 pm

"I've seen people suggest that the PLAC tag be changed to a LOC tag. Why?"

Louis - as one of those who talks in terms of Location, instead of Place, I can say why _I_ wanted to replace Place by Location... To me, Place and Address (but not Postal Address) are on a spectrum where there is no real _qualitative_ difference between the concepts, only quantitative. Add to that my view that GEDCOM's differentiation between Place and Address is irredeemably compromised by definition and usage. So, I wanted to come up with a new entity that combined the two. Using one of the two existing names would not convey the difference, hence I suggested a new name, that of Location.

Caveats:
- Postal Address, as I said, is a different concept from Address. Furthermore, it's one I won't get excited about until the day I can use quantum entanglement to send my 3G grandfather a letter to ask him where in Ireland he came from...
- Difference between Location and Place as per Tony's link. It's an interesting concept but needs a degree of abstraction to comprehend. I'd prefer to use Location to denote the merged concepts of Address and Place, along with Co-ordinates for the more specific Location-as-per-that-link. Seems less abstract a usage.


Tony
ACProctor 2011-12-05T05:35:12-08:00
TestSuite01 - Test Data Format
I am interested in the way the suite of test data will be handled. Rather than delaying this until the project is nearly complete, I'm one of those people that believe it should be available as the project progresses.

However, that usually creates a chicken-and-egg situation. For instance, BG is not concrete enough to represent the test data, and GEDCOM would be inappropriate given the enhanced featres expected in BG.

Does anyone have any strong feelings here? Should we simply have a semi-formalised "written" style of data? How would we represent desirable data items which BG has not yet evolved to support.
louiskessler 2011-12-05T17:03:44-08:00
No chicken and egg here.

Programmers can't program BetterGEDCOM until BetterGEDCOM has a draft ready.

Programmers won't need a test suite until they've started to program BetterGEDCOM. So the test suite is not needed until the BetterGEDCOM draft is delivered.

That said, I think it is worthwhile using examples to build up the BetterGEDCOM standard. Those examples, could then be incorporated into the test suite so that ultimately, the two can be produced together.

Louis
GeneJ 2011-12-05T18:03:42-08:00
Hi Tony and Louis,

Testuser setup a wiki page for us to gather test suite information. We tend to add examples in discussions and don't think to cross-reference those discussions with the test suite.
http://bettergedcom.wikispaces.com/BetterGEDCOM+test+suite

@ Tony. Last summer we wanted a test case, so I developed a series of materials on Sheriff William Preston of Defiance, Ohio. Sources and dialog are posted on my blog in about 10 articles. I haven't posted a series of his children's census. That case may or may not work for us; it is US centric.
http://theycamebefore.blogspot.com/2011/06/sheriff-william-prestons-identity.html
ACProctor 2011-12-06T06:18:20-08:00
Re: "Programmers won't need a test suite until they've started to program BetterGEDCOM. So the test suite is not needed until the BetterGEDCOM draft is delivered"

I disagree there Louis. I have seen this happen before when a specification is still very fluid.

If test cases (as opposed maybe to precise BG "test data") are available sooner then BG can be prototyped, and those test cases used to demonstrate how one approach is better or worse than another.

It would be folly to sail on regardless until we then construct some good "testy" test data and find that the BG design has flaws in it.

What I was going to suggest - and I have seen this done before - is to represent test cases using a simple XML form. This doesn't preclude or presume the final structure of BG. It is merely a way of representing test cases.
WesleyJohnston 2011-12-06T07:19:25-08:00
I agree with ACProctor. You need specific real-world cases at the time the specifications are being nailed down -- what I referred to in another post as benchmark cases.

The actual test data for an already-programmed applicaiton is different. Some of the test may arise from the benchmark cases. But some of it will be for testing to assure that basic things -- the usual things that are taken for granted that every system should have -- are actually performing as desired.

We very much need to have a repository of specific cases at the conceptual level in order to assure that the specifications going into the model are going to produce the results out the back-end that we want to see.
ttwetmore 2011-12-06T07:32:59-08:00
I'm on the Tony bandwagon also. We definitely need complex examples, with sources, evidence, personas, relationships, events, places, dates, in order to be sure the evolving BG covers all the cases. Providing this data seems a daunting task.
GeneJ 2011-12-06T07:38:16-08:00
GeneJ 2011-12-06T08:08:57-08:00
Shh....

Adrian also posted a family history he had written.

See the wiki page
http://bettergedcom.wikispaces.com/personal+notes+on+Family+History+in+the+UK


...Which links to Wordpress ...
http://brucefuimus.wordpress.com/
ACProctor 2011-12-06T08:23:38-08:00
Re: "I keep posting the reference ....

http://theycamebefore.blogspot.com/2011/06/sheriff-william-prestons-identity.html"

I read that and it's quite wordy. The content may be very useful but the narrative style doesn't make it very accessible, if you know what I mean.

This is an interesting issue, and not specifically a BG one. How do you represent the test cases in a more formal way when the ideal representation is the one you're designing? :-)

An XML form may be a bridge too far for many people since it would be very programmatic. Even a semi-formalised written form would be better. A chart form, e.g. with boxes and written content in each, would also work.

Having those test cases available not only helps to prototype syntactic elements, and more complex structural issues, but it gives us all a "common currency" for debating pro's & con's.

It's always difficult to conjure up a good example to make a point, but if we know this fictitious family intimately then we can see those points more easily.

Relating to something else GeneJ said, I don't think a US-centric source example is a problem at all. I already have US references in my own tree, and we're definitely a multinational group.
GeneJ 2011-12-06T09:20:42-08:00
Exactly, Tony. Myrt made the same comment.

If you intend to test the research process issues featured in BetterGEDCOM, isn't it better to avoid fictional cases?

I've embellished real world materials before to develop a genealogical article, but that wouldn't begin to test the features of
"E & C" that are part of BetterGEDCOM.
ACProctor 2011-12-06T09:30:13-08:00
The cases would be real - The family would be fictitious.

I imagined a fictitious family (or set thereof) that embraced all the real-world situations and vagaries that we want to support.

Given that we may make the cases public (e.g. to show vendors something they cannot handle at present), or produce real test-date later on that's based on the test-cases, then I think the names and d.o.b can be invented at least :-)

I think a good start is an informal list of all the things BG must support or handle properly. We can then expand on those to create associarted test cases.
ACProctor 2011-12-06T09:32:19-08:00
... I have a think balloon above my head with a picture of the Munsters posing for a family protrait !!
AdrianB38 2012-05-22T04:11:49-07:00
Syntax05 - User Extensibility of events and characteristics
Description:
The list of events, properties, characteristics, etc, of individuals, etc, in the BetterGEDCOM file format must be capable of extension by users. Extensions must be kept permanently separate from any later definitions in BetterGEDCOM format.

Importance:
Mandatory

Why?:
1. GEDCOM can be extended so to remove the facility would be a step backwards.
2. Many GEDCOM files exist with user-defined events.
Source:
Original Goal 3
AdrianB38 2012-05-22T12:53:15-07:00
Tony - given that you agree with defining arbitrary events, I'm curious why you wouldn't include Events with Characteristics? Is that because you wouldn't agree with the creation of arbitrary characteristics (e.g. adding MilitaryRankSubstantive, MilitaryRankTemporary, MilitaryRankWarSubstantive, as different variations on a theme) or because you would agree but there are extra aspects to consider?
AdrianB38 2012-05-22T12:56:03-07:00
Note the following post on another thread:
"Syntax04 - Extensibility by software companies
"ttwetmore Today 6:09 pm
"This may be an obvious point, but there are two types of extensibility that can be contrasted.

"First there is the type of extensibility done by inventing new tags, possibly up to an including new record types.

"Then there is the type of extensibility done by attaching TYPE tags with values to a higher level generic tag. For example, new novel events could be handled by placing a TYPE tag that describes the event under the generic EVENT tag.

"It is my opinion that BG should forbid the former and promote the latter.

"Some people think the second approach should actually be the overall approach, that every event should be an EVENT tag with either a TYPE subtag or attribute (in the XML sense). Of course, if the vocabulary of the TYPE values is then highly prescribed, this is no longer an extensible solution. An argument for this position is that it minimizes the number of tags. I don't believe that this is an important goal at this level, but others disagree.

"I personally believe that we should have specific tags for all the important events of genealogy and family history, and then use the EVENT/TAG approach for novel situations.

"Tom"

I believe that post to be relevant to this thread, as it seems to describe how User Extensibility of events could be done in BG and is done in GEDCOM.
Adrian
ACProctor 2012-05-22T13:16:54-07:00
Re "I'm curious why you wouldn't include Events with Characteristics?"

It's just that I think the mechanisms for supporting Event-types and user-defined properties would be very different Adrian. Hence, I was looking at customisation more from the 'appropriate mechanism' point of view. For instance:

  • Schema - This involves custom entities and tags/record-types. None of us seem to like this provision.
  • Properties - I think this can be handled by treating properties as name-value-type data that can be defined by the user. Hence, no new tags or record-types would be required.
  • Events - I think we can have a standard Event entity, with an open-ended set of types/categories, and simply pre-define the main important ones.

In summary, the more of the data synatx that we can pin-down at the start then the less we need to change later. I therefore prefer extensible values as opposed to extending tags/record-types.

Tony
AdrianB38 2012-05-22T14:54:20-07:00
"It's just that I think the mechanisms for supporting Event-types and user-defined properties would be very different"
OK - I can buy that. One way or another, we can see a way forward for both events and characteristics. Though I think I'd also prefer to see a pre-defined list of the more important characteristics. If not, can you imagine the ensuing dialog(ue) "Is it occupation, job or career you want us to use? Can't you even be bothered to tell us?"

Moving to your categories / types for events, I agree with the idea in principle, though would argue with the detail at whatever point it became important so to do. Probably only to be expected.

Firstly, as a minor point I'd prefer to call them something like type and sub-type because the names then convey which is the broader and which the more detailed. (This is an informed prejudice brought on by an inability to remember and explain which was which in BR's locomotive classes, diagrams, types, etc. when prefixes like operating / engineering, etc would have helped everyone understand. Me included.)

Secondly, with rare exceptions such as "Union" and "Dissolution", I'm not convinced that the Categories you list add much to proceedings. E.g. knowing that Probate and Will are both in the Legal category doesn't seem that useful if there's nothing for them to inherit from Legal. Whereas, being able to sub-type Will as Will-Codicil seems more useful. So, I'm agreeing with your principles but would focus differently.
AdrianB38 2012-05-22T14:59:13-07:00
PS
Tony, I like the concept of being able to define units for properties. I still remember being at junior school, saying that the answer was (say) 6 and being asked "6 what? Apples, oranges?" It made an impression on me...
ACProctor 2012-05-23T01:12:34-07:00
Thanks Adrian.

Re: "Moving to your categories / types for events".

I have a problem with this area anyway. A "controlled vocabulary" should be 'closed' (i.e. controlled) so that all accepted possibilities are predefined. From that POV, my categories/types (or types/sub-types) do not constitute a controlled vocabulary.

What I really wanted is a set of items that has a predefined subset, but allows user-defined additions without clashing. I tried to avoid having two distinct data values, i.e. a controlled type, plus some user type for when controlled_type="Other", say.

(I hope that makes sense)

Short of decorating the names, say with an underscore prefix (so that _Typexxx is a user-defined type and would automatically be categorised under the controlled-type of 'Other'), I couldn't think of a clean way of accommodating the two values in a single datum.

Any suggestions?

Tony
nick-mat 2012-05-23T03:48:36-07:00
For my program, I classified events into 7 subtypes, Birth, Near Birth, Death, Near Death, Family Union (Marriage), Other Family, and Other. This is mainly so the program can make certain assumptions. If the program cannot find a birth date, then it can use the earliest Near Birth event. The Same for deaths. The Family events are marked so they can be displayed with other family details.

There may be other ways we want to categorize. I am currently experimenting
with extending the concept of Events to cover all attributes/characteristics, which will mean many more categories of events.

Nick
ttwetmore 2012-05-23T04:24:45-07:00
From an RDF point of view (and others) an event and characteristic are the same kinds of things; in the RDF case they both consist of triples. In the case of a characteristic the triple generally links a "subject" (say the person with the characteristic) via a predicate (the name of the "tag") to an object (the value of the characteristic). In the case of an event the object part of the triple is a more complex thing, a subject in its own right with its own characteristics (the main being type, date and place).

The important idea here is that everything is a characteristic, but we choose to partition the most important ones into their own categories. For instance the name of a person is a characteristic, but most of us believe that a name should be treated with its own subset of special tags that make the name such a special characteristic that we prefer to call it a name instead. But things like height or weight or color of eyes, we don't consider genealogically highly significant, so a simple characteristic tag is good enough, if there is one at all.

Especially when the characteristic value is an object unto its own, with its own characteristics (e.g., name, place) we tend to not think of them as characteristics, which we reserve for "simple" things. Or we can say that we think of characteristics as things with atomic, indivisible values, and non-characteristics as things with structured values.

Extensibility in an RDF world primarily means adding new predicates, and new object (in the RDF sense) types and values.

Personally I think extensibility is the same in both the event and characteristic area, because I think we should generalize the two concepts in defining certain properties of the model. Just as I think there should be an EVENT:TYPE "predicate pair" for extending events, there should be a PFACT:TYPE pair for extending characteristics.

Note that none of this deals with the issue of whether events are first class citizens or not (whether they are "record level" entities). If an event is a high level entity, then from an RDF point of view the event is an anonymous entity (a bit of a misnomer in my view since it must have a unique ID, which is a characteristic that uniquely identifies it) and then it becomes (through the good offices of that unique ID) the object of a subject (say the object BIRTH), which is an object in a person (or in layman's terms, it gets "pointed to" by a non-anonymous subject).

An event only seems different from a characteristic because an event exists at two levels in an RDF type graph -- it is the object part of a triple from the point of view of a subject (e.g., a person [subject] has a birth event [predicate and object]), and it is the subject part of triples that define its own characteristics (e.g., a birth event is a anonymous but uniquely identifiable subject, with a type, date and place [predicates and objects]) And of course the date and place objects in the event triples would be subjects in their own right with their own third level objects.

The same situation exists in a computer data structure with multiple fields. Some fields contain primitive objects (integers, strings) and some fields contain references to other data structures. The same situation exists in a GEDCOM file where any line without sub-lines is a simple object of the parent line (which is the subject and the child line tag is the predicate), whereas the parent line is both an object of the line above it and the subject of the lines below it.

The same situation exists in JSON, XML, etc., yadda, yadda, yadda.

I think it is important to first grok the full semantics and relationships of all the concepts, and then divvy they up into terms that are consistent with one another and go along with the pattern of words we choose to describe them. Seeing how every concept in a potential genealogical data model fits in with the RDF triples model of knowledge is a great organizing principle for doing this. Once the organization and partitioning of triple types is done, the term RDF never need be mentioned again. I see the terms events and characteristics as concepts that we can first unify with an RDF point of view, but then choose to give separate names to as we distinguish pragmatic differences between how the concepts are used and how we believe the concepts should best be placed into a data model.

The very fact that we are comfortable, more or less, with the terms event and characteristic, is an indication that we all recognize that that are useful properties of event-characteristis that make us want to carve them out of the larger universe of all characteristics, and give them a special sub-category and treat them somewhat differently (for example define them as top level entities in a data model).

But, when considering extensibility then the fact that an event is really "derived" from a characteristic can be the principle that dictates that they both be extensible in the same way.

Tom
ACProctor 2012-05-23T04:49:15-07:00
It sounds to me Tom like you might be putting undue emphasis on the relationships rather than the entities themselves.

The definition of a Person, Place, or Event is crucially important, but I feel the entity relationships are a natural consequence of their individual [entity] definitions.

For example, we all accept that a Person may be linked to other Persons for genealogical lineage (i.e. a biological hierarchy), and most of us would accept that every Place has a parent Place (i.e. a Place hierarchy). STEMMAs Events also constitute an Event hierarchy in order to add fine structure to the definition of a non-trivial Event. The point I'm getting at here is that the triples linking one of these entity types to another of the same type is a consequence of their definitions rather than the other way around. [yeah, I might be splitting hairs but it feels right to me]

Also, STEMMA has an inheritance mechanism that involves Events, Resources (i.e. supporting files), and Citations (or "sources" to everyone else). I don't believe RDF has any mechanism that can model this type of inheritance, which in turn is directly analogous to subclass/superclass in OOP.

Tony
ttwetmore 2012-05-23T06:41:41-07:00
Tony,

Thanks. I didn't intend to stress relationships over entities.

In RDF, as I'm sure I you know, a Place being a sub-Place of another Place is a simple RDF triple, (place1465, is-subplace-of, place2638). Of course, there would be other triples like (place1465, has-name, "New London"), (place1465, has-type, city), (place2638, has-name, "Connecticut"), and so on.

For Person links I believe we need to distinguish direct person-to-person links, and linking mediated by relationship objects. Both seem to me to have their proper places in a genealogical data model, and both easily represented in RDF. For example the multi-level inter-persona relationship that I believe is critical to the next generation of genealogical software, in order to fully support the research and evidence process, might have triples like:

(persona2435, is-evidence-for, persona6481)
(persona6481, is-concluded-to-be-an-individual-because-of, "... proof statement ...")

Contrary to your comment in your last paragraph, class membership and class inheritance (i.e., OO concepts) are handled by RDF, as in the following triples:

(person24354, is-an-instance-of, Person)
(Person, is-a-subclass-of, GenealogicalEntity)

In most contexts the "is-an-instance-of" predicate is simplified to "isa" or "is-a".

One could also define schemas using RDF, as in things like:

(is-a-characteristic-of, is-a, Predicate) << These three go all the way down to the "assembly" language level of RDF,
(Name, can-be-a, Subject) << as a Predicate, Subject and Object are the three basis concepts of RDF.
(Name, can-be-a, Object)
...
(Name, is-a-characteristic-of, Person)
(Birth, is-an-event-of, Person)
(Birth, is-a, Event)
(Date, is-an-optional-characteristic-of, Event)

It must be able to, as a schema is a specification of a form of knowledge.

You can use RDF to define your data storage or your external file formats, though this might be nothing more than a theoretical exercise.

(Person, is-represented-as, Person-Relational-Table)
(Person-Relational-Table, has-column, Name-Column)
...

I'm not suggesting we do any of these things with RDF. I'm just pointing out that RDF provides an excellent conceptual framework for casting nearly everything we need to discuss into a uniform vernacular.

Tom
ACProctor 2012-05-23T07:03:22-07:00
OK, I se what you're saying now Tom.

However, how would RDF distinguish the OOP concepts of "has-a" and "is-a"? If one entity is embedded within another, rather than being linked to it, then does that cause a problem for these triples?

Tony
ttwetmore 2012-05-23T08:16:54-07:00
Tony,

To answer you directly:

(Person, is-a, GenealogicalEntity) << inheritance relationship
(Person, has-a, Name) << containment/component-of relationship

Here Person, GenealogicalEntity and Name are Classes, not Objects, from the OO point of view. From the RDF point of view, however, Person is a Subject, and GenealogicalEntity and Name are Objects. But this use of Subject and Object are linguistic uses, so the two kinds of Objects are entirely different things. It is a the schema interpretation that would specify that all three are Classes. RDF per se doesn't care.

Note that is-a and has-a, from the OO point of view, are relationships between Classes, at least the way I am defining them here.

Tom
AdrianB38 2012-05-22T04:18:55-07:00
In Discussion on "Goal 2 -- BG container formats" ( http://bettergedcom.wikispaces.com/message/view/GOALS/30141635#54417872 )
the following posts seem relevant:

Tom Wetmore 21/5/2012 (extract only)
"If vendors unofficially extend BG for their own use they are being very bad and should be shunned. If the bad boy is someone like Ancestry.com, good luck; it will be like the kind of standards usurping that Microsoft has long been famous for. The giants can play fast and loose with standards, and the rest of us can whine all we want, but at the end of the day, must go along.
"I personally believe that BG should NOT be extensible, but that changes and additions can be proposed and acted upon by an official process. Taking this position is tantamount to the claim that the BG designers can do an excellent job of anticipating the needs of the industry. I personally believe this to be the case in theory; however the technical management of the BG process must undergo a radical improvement before this would be possible. You can FHISO for BG in the preceding sentence if you believe that FHISO will end up in charge."

Louis Kessler 21/5/2012 (extract only)
"I 100% agree with Tom that BetterGEDCOM should NOT be extensible, for the exact reasons he states.
"That makes every decision, such as the BetterGEDCOM container decision, a tough one. It will be hard to U turn once all developers have implemented it one particular way."
AdrianB38 2012-05-22T04:28:44-07:00
I believe that it is important to distinguish user defined events and characteristics (a.k.a. attributes, etc, etc.) from the extensions made to GEDCOM by the software suppliers, whose tags are (theoretically) distinguished by the underscore prefix. While both extend the meaning of the GEDCOM standard (and both are _within_ the standard if implemented correctly), user defined events & characteristics have, I believe, less of an impact on the structure of the GEDCOM file in that they are firmly localised in where they occur, whereas software-supplier extensions can be anywhere, at any level. In addition, the control of the two types of extension is theoretically different - software suppliers could, in theory, tell us the meaning of their extensions whereas only individual users can explain their own "new" events, etc.
AdrianB38 2012-05-22T04:30:39-07:00
Note that software suppliers could very well extend the range of events by the same mechanism that users do.
AdrianB38 2012-05-22T04:46:25-07:00
On a personal level, I find it very difficult to believe that BG (or whoever) will ever be able to come up with a list that defines all necessary events and characteristics. New ones could appear for several reasons:
- events in a previously unknown (to BG) culture;
- in a culture known to BG, we might, nevertheless, find events not previously known;
- events might have been considered but rejected as not requiring a separate BG event "code", but the user disagrees;
- events might have been considered, allocated a BG event "code" but this code is the same as another, similar event in possibly another culture - the user dislikes the way that the application software processes the 2 the same, so wants to separate them;

It is, I personally believe, not credible that BG's controlling authority could react fast enough to approve new events. Quite apart from anything, approval means occasional rejection but if the BG controlling authority is not the leading authority in the genealogy of the appropriate culture, how can it make an informed decision that does not upset someone who is the leading authority?

Even if the controlling authority reacts fast enough to approve new events - what then? What value do they provide? How do the software suppliers introduce the new events into their software - they cannot - so how does the user get the new, approved, event in without using a user-definable event code?
ACProctor 2012-05-22T09:00:48-07:00
I wouldn't include Events with Characteristics here Adrian. It will be such a fundamental requirement to be able to define arbitrary Events that I would turn this part on its head.

STEMMA has a general entity for describing Events (protracted as well as simple ones). However, there is a controlled vocabulary of Event-categories and Event-types that can be used to locate the well-defined ones such as marriage, and variations thereof.

The above link also shows how roles are interpreted relative to each category+type combination.

On the subject of extensions, STEMMA only really acknowledges extensions to the set of properties (incl. Person, Place, and Event properties), and extensions to the schema itself.

Extensions to the schema will be controversial. I only added the topic to show how it should be done if it was found necessary. However, extensions to properties is also fundamental.

The points to note in STEMMA are that:

  • The properties have a data-type
  • The names of the properties have a scheme to prevent clashes
  • The properties may have units, e.g. for height & weight
  • The properties may actually reference another entity such as a Person.

Again, I'm not suggesting this is the way to go but I hope that the approach is sufficiently novel that prevents us from simply copying the GEDCOM approach.

Tony
AdrianB38 2012-05-22T04:13:15-07:00
Syntax04 - Extensibility by software companies
The BetterGEDCOM file format must be capable of extension by software companies. Extensions must be kept permanently separate from any later definitions in BetterGEDCOM format.

Importance:
Mandatory

Why?:
1. GEDCOM can be extended so to remove the facility would be a step backwards.
2. Many GEDCOM files exist with extensions.

Source:
Original Goal 3

Way forward?:
Note that extensions in GEDCOM are identified by an underscore, which applies only to extensions. Any new GEDCOM tags will not have the underscore so will not be confused with extensions. An equivalent mechanism needs to be used for BetterGEDCOM.
AdrianB38 2012-05-22T04:18:33-07:00
In Discussion on "Goal 2 -- BG container formats" ( http://bettergedcom.wikispaces.com/message/view/GOALS/30141635#54417872 )
the following posts seem relevant:

Tom Wetmore 21/5/2012 (extract only)
"If vendors unofficially extend BG for their own use they are being very bad and should be shunned. If the bad boy is someone like Ancestry.com, good luck; it will be like the kind of standards usurping that Microsoft has long been famous for. The giants can play fast and loose with standards, and the rest of us can whine all we want, but at the end of the day, must go along.
"I personally believe that BG should NOT be extensible, but that changes and additions can be proposed and acted upon by an official process. Taking this position is tantamount to the claim that the BG designers can do an excellent job of anticipating the needs of the industry. I personally believe this to be the case in theory; however the technical management of the BG process must undergo a radical improvement before this would be possible. You can FHISO for BG in the preceding sentence if you believe that FHISO will end up in charge."

Louis Kessler 21/5/2012 (extract only)
"I 100% agree with Tom that BetterGEDCOM should NOT be extensible, for the exact reasons he states.
"That makes every decision, such as the BetterGEDCOM container decision, a tough one. It will be hard to U turn once all developers have implemented it one particular way."
AdrianB38 2012-05-22T04:29:14-07:00
I believe that it is important to distinguish user defined events and characteristics (a.k.a. attributes, etc, etc.) from the extensions made to GEDCOM by the software suppliers, whose tags are (theoretically) distinguished by the underscore prefix. While both extend the meaning of the GEDCOM standard (and both are _within_ the standard if implemented correctly), user defined events & characteristics have, I believe, less of an impact on the structure of the GEDCOM file in that they are firmly localised in where they occur, whereas software-supplier extensions can be anywhere, at any level. In addition, the control of the two types of extension is theoretically different - software suppliers could, in theory, tell us the meaning of their extensions whereas only individual users can explain their own "new" events, etc.
AdrianB38 2012-05-22T04:51:46-07:00
On a personal level, I am prepared to be convinced that extensibility by software companies in providing new "tags" should not be allowed. Though there is a psychological issue there that it was possible in GEDCOM.
ACProctor 2012-05-22T09:11:27-07:00
Correct if I'm wrong here Adrian but I'm assuming this is basically changes to the schema, and hence involves new entities and new tags. By contrast, extensions to properties should not require a schema change, or a revised model or format.

If so then I mentioned this in passing here.

Although STEMMA mentions extending its schema using standard XML approaches, I'm less than convinced this is going to be useful. It also has a strong dependency on the serialisation format we use (e.g. XML, RDF) - e.g. you wouldn't prefix tag names with an underscore in XML.

On the whole, I side with Tom and Louis until such a time as a strong case is made for schema extension - one that cannot be accommodated by an official revision of the data reference model and its serialisation formats.

Tony
ttwetmore 2012-05-22T10:09:56-07:00
This may be an obvious point, but there are two types of extensibility that can be contrasted.

First there is the type of extensibility done by inventing new tags, possibly up to an including new record types.

Then there is the type of extensibility done by attaching TYPE tags with values to a higher level generic tag. For example, new novel events could be handled by placing a TYPE tag that describes the event under the generic EVENT tag.

It is my opinion that BG should forbid the former and promote the latter.

Some people think the second approach should actually be the overall approach, that every event should be an EVENT tag with either a TYPE subtag or attribute (in the XML sense). Of course, if the vocabulary of the TYPE values is then highly prescribed, this is no longer an extensible solution. An argument for this position is that it minimizes the number of tags. I don't believe that this is an important goal at this level, but others disagree.

I personally believe that we should have specific tags for all the important events of genealogy and family history, and then use the EVENT/TAG approach for novel situations.

Tom
ACProctor 2012-05-22T10:23:12-07:00
I think that's basically what I said in the other extensibility post Tom, although I keep re-reading your post because I'm not 100% certain ;-)

I was pointing out that STEMMA's generic Event entity can be used to model any event. However, the controlled vocabulary for its Event-categories and Event-types only covers the well-defined ones that software will want to identify, such as Unions.

Other Events might have Event-types that fall outside that he predefined ones, or sit under the "Other" Event-category, but it sounds like the main difference is that I'm using distinct types rather than distinct tags/record-types for the important events.

Does that sound like a fair description of the difference Tom?

Tony
AdrianB38 2012-05-22T12:44:08-07:00
"I'm assuming this is basically changes to the schema, and hence involves new entities and new tags"
I believe that's what I'm talking about. If that sounds less than positive, I never got that close to dealing in schemas so I tend to be careful when dealing with that terminology. In fact, if you put extensibility in those terms (extending the schema) then my relatively neutral stance towards extensibility swings to agreeing with you, Tom and Louis.

Tom - TYPE under a generic EVENT is, I believe, covered by Syntax05 User Extensibility of events and characteristics so I'm glad to see you very much agree with it.
nick-mat 2012-05-23T04:26:18-07:00
I think we need to be careful about putting a ban on any type of extension, if only for the pragmatic one that we can't actually stop people extending it, so it's better to control it.

But there may well be areas that some groups want covered that aren't appropriate for a standards body to be involved in. I'm thinking of the GEDCOM SUBN submission record which is used by the Mormon Church but I'm sure there must be other examples.

I've made use of GEDCOMs extendability myself when exporting data to be read by one of my own programs.

Nick

Nick
ACProctor 2012-05-24T11:17:42-07:00
Attribution
This subject seems to be absent from the discussions here, unless I've missed something.

Attribution is basically attributing a piece of information to a particular individual. In principle, it should be distinct from a citation since the material provided by that individual would hopefully have its own citations. What do people think? Should the data format have an element for attribution?

It's interesting because it's the one situation where I consider genealogy needs a Postal Address, as distinct from a Place or a Location. I know you may be thinking that these are all very close concepts, but I tried hard to distinguish them here under section 3, 'Place Names'.

Postal address variations around the world may be found at: international-address-formats. There is no standard yet, although the topic has been discussed many times (see International Address Standardisation).

I want to raise a small issue here since the W3C RDF Contact description doesn't seem to follow any international guidelines that I can see. As an example of an internationalisation issue, one of the fields is called stateOrProvince. Province was obviously added as a token nod to the rest of the world. Well, Ireland has provinces but they're never put in postal addresses. Ireland uses counties, just like the UK. OK, so you're probably going to say that it's only a field name and that it can hold a county. Well, I used to have enormous problems buying PPV credits from Ancestry because their UI forced me to select a province from a list entitled "county". This then meant the address didn't match the one registered on my credit card. It took a couple of years for that to be fixed but it shows how these things can become an issue.

Tony
ttwetmore 2012-05-25T10:23:59-07:00
Tony,

Thanks!

My hope is that we can collapse the vocabularies as we collect terms. That was the reason I coined the term PFACT, as an attempt to humorously claim that property, fact, attribute, characteristic and trait are mostly synonymous for our purposes. I'm wondering whether source, attribution (and even citation in the context I am not comfortable with) form another cluster of terms that really mean the same thing. If so it's another are where collapsing with simplify things. Maybe quality and confidence are the same; probably not.

Tom
ACProctor 2012-05-25T11:28:18-07:00
That's a neat acronym, although the converse approach to collapsing the similar terms is to differentiate them and pin them on something distinct.

Sometimes, there are concepts that are similar but they need to be differentiated, e.g. Places, Locations, and Postal Addresses. If we collapse the obvious terms then we may be forced to invent longer, less intuitive terms in their place.

Tony
ttwetmore 2012-05-25T11:54:34-07:00
Tony,

I agree completely. Remove redundancy only. Keep anything unique that is important to the model.

I would be interested on your take on the difference between place and location.

Tom
louiskessler 2012-05-25T19:44:44-07:00

Tom,

I've seen people suggest that the PLAC tag be changed to a LOC tag. Why?

I've seen people suggest a 0 LOC record. Why can't that be a 0 PLAC record?

To me, place and location are the same thing.

I like the "rules" you and Tony are establishing.

Louis
Alex-Anders 2012-05-25T19:48:47-07:00
In some countries there are small sections called location. They are sometimes not listed on a map, but only have a 'Location' sign, so need to be drilled down further than Place, as the location is usually within a defined place/town etc.
A second use for location may be the section within a cemetery, shopping centre etc where an event has occurred??

Alex
ttwetmore 2012-05-25T21:32:49-07:00
Louis,

I agree with you.

Maybe the issue has to do with hierarchy. I believe that we should view places as a hierarchy. That is, Winnipeg is a place; Manitoba is a place that includes Winnipeg; and Canada is a place that contains Manitoba; and all three are related hierarchically. The important thing is that they are all places. You could know that someone was born in Canada with nothing more specific available. You could know that someone was born in Manitoba, with nothing more specific. And one could know that someone was born in Winnipeg.

Possibly others don't see it that way. Maybe they see "Winnipeg, Manitoba, Canada" as a single place, so that a neighborhood or other named sub-part of Winnipeg would have to be something other than a place. Maybe that's what a location might be? That seems to agree with Alex's point. Though I don't mind calling that next place deeper in the hierarchy a place also, while Alex points out that others might prefer to call these deeper levels locations. I can see Alex's point, though I can go either way.

In censuses in the United States, at least in cities, the city ward or the census enumeration district is often mentioned. When I know this information I put it into the place hierarchy also. For example, Ward 4, New London [a city], New London [a county], Connecticut, United States. I don't mind thinking about Ward 4 as just another, more granular, level in a place hierarchy.

In the United States we tend to think of places as being made up of three "standard" levels: city, county and state. But we all know this breaks down all over the place and tends to be very US-centric. I think of a place as a hierarchy of geopolitical areas, that can be as complete or as sparse as seems to fit the evidence, and that all are just as good as any other. I think it is good to use city, county, state, in the US of course, when the data supports it, but that there should be no penalty on using some other system of levels, when it is more appropriate. I don't even think you have to define what that hierarchy is. Just use it.

And of course then there is the street address used in some countries. I guess those are usually kept separate as a different kind of entity, but couldn't one also think about the address as another level in a place hierarchy: 73 Lower Blvd, Ward 4, New London, New London, Connecticut, United States. Could that be a place hierarchy of six levels? Maybe it's more complicated than my simple mind can straighten out.

But then there is the issue of zip codes (US) and postal codes (rest of world). They really are places too, as you can look at their boundaries on a map. In the US I bet its the case that you could find examples of both more than one zip code in an enumeration district, and more than one enumeration district in a zip code. So in some cases, to keep things hierarchical you might have to write something like:

73 Lower Blvd, 06320, Ward 4, New London, New London, Connecticut, United States

while in others you'd have to do it:

73 Lower Blvd, Ward 4, 06320, New London, New London, Connecticut, United States

Or maybe I'm just being an idiot with this example and no one would ever want to see places this way.

Maybe on the 0 LOC idea the point was not to have a tag name conflict with the 2 PLAC tag found in events. I'm with you, however. I don't see anything wrong with 0 PLAC records AND 2 PLAC tags in the same specification.

Tom W.
Alex-Anders 2012-05-25T21:41:21-07:00
Tom
I am open to a location or further refinement to be included within the Place, but how many becomes too many.

I also tend towards the address as being in place, as it is actually a location with a town etc.

So 73 Lower Blvd, 06320, Ward 4, New London, New London, Connecticut, United States would not be a problem for me as being a place.

Do we also then add a location within the 73 Lower Blvd into Place eg kitchen etc, as location of event?

Alex
ACProctor 2012-05-26T01:53:13-07:00
Re: "I would be interested on your take on the difference between place and location."

I provided a link to this in the first post of this thread Tom: Places under section 3, 'Place Names'. I wanted to eliminate the confusion over the three concepts: Place, Location, and Postal Address since there are some big differences.

(I apologise for taking this Attribution thread OT but we've now included Places)

In those same research notes, section 3.1, 'Hierarchies', discusses Place Hierarchies, and in particular how the 'geographic' approach needs to be supplemented with 'administrative' divisions within the national borders of a country. It also tries to separate elements such as religious, jurisdictional, and political properties in the hierarchy. STEMMA may be unusual in taking the Place Hierarchy concept down to street and building level but it's generated some surprising insights in my experimentations here (using my own family history data).

Related to this is 5.1, 'Place Authority', which strongly recommends the provision of a federated Place Authority, and how some resources in England & Wales are almost there but are being squandered though lack of vision.


Tony
ACProctor 2012-05-26T02:00:08-07:00
Re: "In the United States we tend to think of places as being made up of three "standard" levels: city, county and state"

I have to say - as a foreigner - that it breaks down in the US too. Washington DC ("District of Columbia") is a federal district and not part of a state. :-))

Tony
AdrianB38 2012-05-26T08:58:01-07:00
"I've seen people suggest that the PLAC tag be changed to a LOC tag. Why?"

Louis - as one of those who talks in terms of Location, instead of Place, I can say why _I_ wanted to replace Place by Location... To me, Place and Address (but not Postal Address) are on a spectrum where there is no real _qualitative_ difference between the concepts, only quantitative. Add to that my view that GEDCOM's differentiation between Place and Address is irredeemably compromised by definition and usage. So, I wanted to come up with a new entity that combined the two. Using one of the two existing names would not convey the difference, hence I suggested a new name, that of Location.

Caveats:
- Postal Address, as I said, is a different concept from Address. Furthermore, it's one I won't get excited about until the day I can use quantum entanglement to send my 3G grandfather a letter to ask him where in Ireland he came from...
- Difference between Location and Place as per Tony's link. It's an interesting concept but needs a degree of abstraction to comprehend. I'd prefer to use Location to denote the merged concepts of Address and Place, along with Co-ordinates for the more specific Location-as-per-that-link. Seems less abstract a usage.
louiskessler 2012-05-26T12:06:55-07:00

Yikes! This wiki is messed up and it is frustrating to me and everybody.

Tony: You opened a topic called "Attribution" which will get lost because it should have had a Requirements prefix before it.

Then the discussion veered off into Location/Places (which I inadvertently contributed to the veering). That should have been opened or added to under a Data-Placexx subject.

See: Valuable Discussions Are Getting Lost

Louis
ACProctor 2012-05-27T00:53:59-07:00
Thanks Louis. I did apologise for having to deviate the thread slightly in order to different the terms (Postal Address being relevant to Attribution as part of the Contact Details).

I didn't add a prefix to 'Attribution' because it's not yet in the requirements catalog(ue).

The reason for this thread was really to see if anyone felt it should be there. I had similar views to Tom's initial ones where Attribution and Citation certainly have some overlap.

I've transferred these last few posts to a more relevant prior discussion on Location and Address.

Tony
ttwetmore 2012-05-24T11:39:23-07:00
Tony,

It sure would be great if we could all agree that a citation is a formatted string found in a report that cites where evidence was found. For some reason the word citation has become synonymous with the term source in many peoples' minds.

That being said, I have a hard time understanding how an attribution is different from a source. Is it because an attribution comes from a person and not from something on paper?

Sources are hierarchical, so if we use a person as a source, can't we then have that person, AS A SOURCE, refer to a more typical source at the next level, and so on?

I am questioning the idea that an attribution and a source are fundamentally different things. I don't see it. Could someone who sees the difference please try to explain it.

Tom
ttwetmore 2012-05-24T11:50:45-07:00
Tony,

A little more.

If a person gives us information with no justification, is this where an attribution comes in? Couldn't we then call that person a source and forget the word attribution?

If a person lets us know of the existence of evidence, and we go get that evidence, do we attribute the person as a kind of thank-you acknowledgment, but use a normal source trail for the actual evidence?

So, do we want to record not only the evidence we have and the sources of the evidence, but also how we came to know about the evidence, and who helped us learn about it as well. Up until now I would have said we are only interested in recording the evidence and sources themselves, not the history of how we found them.

But, some people want to add searches, and research problems, and todo lists to the genealogical data model, and if this were the case, is this the area where we would mention the people that helped us find information?

What kind of thing is an attribution, then, a source that is a person, or a tip of the hat to a person who helped us find evidence?

Tom
ACProctor 2012-05-25T06:16:16-07:00
I agree that for a 'personal collection' the distinction is small, and not worth avoiding the natural generalisation of 'source'. If sources form a chain - which I believe they should - then it fits quite easily.

For instance, if another researcher informs you that "blah...blah...", and provides some reasoning as well as citations, then there is no difference to picking up a book in a library and reading the conclusions or recollections of someone who has published, and just happens to have a book that can be cited.

I think the main difference, if there is one, comes in 'collaborative trees'. The topic is discussed in several other places at the moment. That area would be relevant to any attempt to define a data model, but isn't really covered by the old "exchange and long-term storage" tag line.

Tony

P.S. Apologies for using the term "citation" when I meant "source". This is an unfortunate hangover from the STEMMA terminology, which in turn was deliberate to reinforce the fact that the source is external to the data (e.g. in a book, library, etc) and the data merely cites it. It's a wonder I have any hairs left to split. ;-)
louiskessler 2012-06-01T20:13:39-07:00
Data-Place02 - Recording of Structured Data About Locations

Tony said in the Data-Place03 discussion: "my own data includes some historical narrative on a few places because they were so important to the family ..."

I've always been saying that: "Places are people, too."

Two big things GEDCOM doesn't have and should have are:

1. PLAC records at level 0, which can contain facts and multimedia about the place (just like an INDI).

2. Allow events to be associated with a place. e.g. Fire on Main Street. These events can then be used in timelines, or to document the history of the place.

Louis
ACProctor 2012-06-02T02:37:05-07:00
I totally agree with you Louis.

The only comment I'd make is in relation to "Allow events to be associated with a place". STEMMA Events are top-level (i.e. level-0) entities, and can be referenced by Person or Place entities. Hence, in your "Fire on Main Street" example, the Place [Main St] would have a link to that Event, but so would the Person entities for the people who's lives were affected by it.

From the way you nailed the requirement, I suspect you anticipated this already but I thought I would mention it for clarity.

Tony
ttwetmore 2012-06-02T02:57:59-07:00
Louis, Tony,

I agree that places can (and often should) be level one entities, which is what I have put in all my examples as top level places with unique IDs. However, as my examples have also shown, I believe that places can also be non-top level sub-elements contained directly in the event elements that need them. This is where you would use places that you might not yet have linked into a known hierarchy, or places with unusual values (e.g., "out west", "on an Atlantic crossing from Hamburg to Philadelphia"). Places should be allowed to exist in both contexts.

I believe the same about events. They can be level one entities also. Though the only time it really makes sense to have then at the top level is if there are many role players in the event so that there are many event references pointing to the event.

You have to use a little common sense here. Places are not tied directly to people; they are completely independent of people. So it makes great sense to think of them as making up their own top level universe of values.

The same is not true of events. The events of genealogical significance are all tied directly to their role players. The events would not exist in the absence of their human participants. They do not form an independent universe of values, so they do not scream out for level one-hood. Placing genealogical events at the top level in my opinion, is nothing more than a performance enhancement so that the same data does not appear in many places and cause update headaches.

I think you can almost make the same argument about non-personal events that you are tying to places (e.g., the San Francisco earthquake, the great hurricane of 1938). If the event only applied to one place, there is no advantage to making it a top level entity. If it applies to many locations then as a practical matter it can be useful to make it an independent entity.

I'm not going to complain loudly if someone wants to allow places to refer to events, though I would point out that it's a little backwards, since events must have places. But if you really do want to include events that don't involve any person role players this would be necessary. Again, I won't complain too loudly if one wants to include non-person events, though I would also ask whether we are talking about family history software or generic historic software.

Tom
ACProctor 2012-06-02T04:01:52-07:00
I expect everyone has heard the term Microhistory but it's worth reading the description at this link. The difference between family-history and microshistory is very small, and I have used the terms as synonyms when talking about newspaper archives to newspapers who want to charge the earth for research using them.

Tony
ttwetmore 2012-06-02T08:30:58-07:00
Tony,

I had not heard the term microhistory before. Thanks for introducing it here.

I have little objection to BG evolving in the direction of supporting more of general history than just genealogy and family history. Frankly I think our models can easily accommodate this expanded view. However, other BG'er's might think of this as just too much fiddling. I can imagine much opposition to widening the BG scope that far.

I have wondered a number of times whether there is any software designed explicitly for historians, so they can catalogue their sources, create evidence records, record their conclusions, write their papers and so on. I would certainly think it ironic if genealogists, whom I imagine most historians think of in somewhat disparaging terms, generated software systems that could truly help them in their research!

In my user interface designs for DeadEnds, I use the 3x5 index card as one of the metaphors for holding evidence. I imagine each persona displayed as single 3x5 card. (I have a nice "typewriter" font for the index cards that make them look wonderfully authentic).

I like the idea of the DeadEnds user "pushing" index cards around on the desktop to gather those together that they believe represent the same person and then being able to "clip" them together into packets that represent that conclusion. I imagine that the index card metaphor would be a natural for software designed for historians.

Tom
ACProctor 2012-06-02T08:43:59-07:00
Re: " I have wondered a number of times whether there is any software designed explicitly for historians"

This is a topic I find really interesting Tom. I'm glad I'm not the only one that has thought about it.

When I was originally gathering requirements for STEMMA, I had this moment of clarity where I thought 'this could be a general format for historical research'.

... then I came to my senses and decided it was completely insane :-)

Tony