The pajek file is for testing and visualing alternatives in the automated procedure with the
alberti data. Actually, here, all the possible identifications
with parents who fit the fill of the current parameters are
shown (the same can be done for the entire dataset).
I have not yet adjusted all the parameters optimally nor
ranked the choices. Eventually you will have the following
1 best choice for parents of busband
2 ditto for wife
3 2nd best choice for parents of busband, if different
4 ditto for wife
5 3rd best choice for parents of busband, if different
6 ditto for wife
when entered back in the datafile these will be six separate
variables plus one or more variables for the probable reliability
of the estimate.
The identification of parental couples will correspond to their
unique line number in the master file, which will also become a
variable so the file can subsequently be changed.
Within Pajek using the Options/ReadWrite Treshold (setting to 3, 5, etc)
you can strip off all but the top choices to get the
best estimate of parentals or keep the multiple relations to see
what are the alternatives (be sure to set the threshold back to 0 when done!).
There may still be a few where the
links shown on the genealogies are not among the alternatives
because of huge discrepancies of marriage dates for parents and
children. In some case, the genealogies themselves may be wrong.
There is an incredibly good fit at this point between the
automated procedure and the Alberti genealogy. Part of this is
because you have done an amazing job with accuracy and uniqueness
of the spelling of the names. 90% or more of my best estimates
by the automated procedure will correspond with what is in the
genealogies. What is amazing to me is that nearly all of the
ancestors who are branching points in the tree (ie who leave
descendants) are recovered by the atomated procedure, at least
within the timeframe covered. Lots of those who do not leave descendants,
of course, died early, did not marry, or migrated out so they do not enter
This job took three days of programming, but I am quite happy
with the results. I now need to tweak the parameters a bit to
get optimal predictions, and then to program the ranking and
reliability estimation procedure.
I have an option to feed the program a family name or names, as I
have done here with alberti, so that only one or more selected segments
of the genealogy is reconstructed.