Class MatchStarTables


  • public class MatchStarTables
    extends java.lang.Object
    Provides factory methods for producing tables which represent the result of row matching.
    Author:
    Mark Taylor (Starlink)
    • Field Detail

      • GRP_ID_INFO

        public static final ValueInfo GRP_ID_INFO
        Defines the characteristics of a table column which represents the ID of a group of matched row objects.
      • GRP_SIZE_INFO

        public static final ValueInfo GRP_SIZE_INFO
        Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
    • Constructor Detail

      • MatchStarTables

        public MatchStarTables()
    • Method Detail

      • makeJoinTable

        public static StarTable makeJoinTable​(StarTable table1,
                                              StarTable table2,
                                              LinkSet pairs,
                                              JoinType joinType,
                                              boolean addGroups,
                                              JoinFixAction[] fixActs,
                                              ValueInfo matchScoreInfo)
        Constructs a table made out of two constituent tables joined together according to a LinkSet describing row matches and a flag determining what conditions on a RowLink give you an output row. The columns of the resulting table are made by appending the columns of the constituent tables side by side.

        The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.

        The matchScoreInfo parameter is optional. If it is non-null, then an additional column, described by matchScoreInfo, will be added to the table containing the score values from any RowLink2s in links. The content class of matchScoreInfo should be Number or one of its subclasses.

        This is a convenience method which calls the other makeJoinTable method.

        Parameters:
        table1 - first input table
        table2 - second input table
        pairs - set of links each representing a matched pair of rows between table1 and table2. Contents of this set may be modified by this routine
        joinType - describes how the input list of matched pairs is used to generate an output sequence of rows
        addGroups - flag which indicates whether the output table should, if appropriate, include GRP_ID_INFO and GRP_SIZE_INFO columns
        fixActs - actions to take for deduplicating column names (array of the same length as tables)
        matchScoreInfo - may supply information about the meaning of the match scores
        Returns:
        table representing the join
      • makeJoinTable

        public static StarTable makeJoinTable​(StarTable[] tables,
                                              LinkSet rowLinks,
                                              boolean addGroups,
                                              JoinFixAction[] fixActs,
                                              ValueInfo matchScoreInfo)
        Constructs a table made out of a set of constituent tables joined together according to a LinkSet describing row matches. The columns of the resulting table are made by appending the columns of the constituent tables side by side. Each row in the resulting table corresponds to one RowLink entry in a set rowLinks; if that RowLink contains a row from one of the tables being joined here, the columns corresponding to that table are filled in. If it contains multiple rows from that table, an arbitrary one of them is filled in.

        The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.

        The matchScoreInfo parameter is optional. If it is non-null, then an additional column, described by matchScoreInfo, will be added to the table containing the score values from the RowLinks in links. The content class of matchScoreInfo should be Number or one of its subclasses.

        Parameters:
        tables - array of constituent tables
        rowLinks - set of RowLink objects which define which rows in one table are associated with which rows in the others
        addGroups - flag which indicates whether the output table should, if appropriate, include GRP_ID_INFO and GRP_SIZE_INFO columns
        fixActs - actions to take for deduplicating column names (array of the same length as tables)
        matchScoreInfo - may supply information about the meaning of the link scores
      • makeSequentialJoinTable

        public static StarTable makeSequentialJoinTable​(StarTable[] tables,
                                                        LinkSet rowLinks,
                                                        JoinFixAction[] fixActs,
                                                        ValueInfo matchScoreInfo)
        Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a LinkSet. Any input tables which do not have random access must have row ordering consistent with (that is, monotonically increasing for) the ordering of the links in the LinkSet. In practice, this is only likely to be the case if all the input tables are random access except for (at most) one, and the links are ordered with reference to that one. If this requirement is not met, sequential access to the resulting table is likely to fail at some point.
        Parameters:
        tables - array of constituent tables
        rowLinks - link set defining the match
        fixActs - actions to take for deduplicating column names (array of the same size as tables)
        matchScoreInfo - may suply information about the meaning of the match scores, if present
      • makeInternalMatchTable

        public static StarTable makeInternalMatchTable​(int iTable,
                                                       LinkSet rowLinks,
                                                       long rowCount)
        Analyses a set of RowLinks to mark as linked rows of a given table. The result of this method is a two-column table whose rows correspond one-to-one with the rows of the table referenced in the link set. The output columns are defined by the constants GRP_ID_INFO and GRP_SIZE_INFO. Rows of the table linked together by rowLinks are assigned the same integer value in the new GRP_ID_INFO column, and the GRP_SIZE_INFO column indicates how many rows are linked together in this way. Each group corresponds to a single RowLink; if a row is part of more than one RowLink then only one of them will be recorded in the new columns. Any rows linked in rowLinks which do not refer to table have null entries in these columns.
        Parameters:
        iTable - the index of the table in which internal matches are to be sought
        rowLinks - a collection of RowLink objects linking groups of rows together
        rowCount - number of rows in the returned table (must be large enough to accommodate the indices in rowLinks)
        Returns:
        a new two-column table with a one-to-one row correspondance with the table describing internal row matches
      • makeParallelMatchTable

        public static StarTable makeParallelMatchTable​(StarTable table,
                                                       int iTable,
                                                       LinkSet links,
                                                       int width,
                                                       int minSize,
                                                       int maxSize,
                                                       JoinFixAction[] fixActs)
        Constructs a new wide table from a single given base table and a set of RowLinks. The resulting table consists of a number of sections of the original table placed side by side, so it has width times the number of columns that table does. Each row is constructed from one or more rows of the original table; each output row corresponds to a single RowLink. Only row links which have at least minSize entries and no more than maxSize entries are converted into output rows; if there are more entries than the width of the table the extras are just discarded. Any row references in a RowLink not corresponding to table index iTable are ignored.
        Parameters:
        table - input table
        iTable - index corresponding to this table in the rowLinks set
        links - collection of RowLink objects describing the matches. This collection is modified on exit
        width - width of the output table as a multiple of the width of the input table
        minSize - minimum number of entries in a RowLink to count as an output row
        maxSize - maximum number of entries in a RowLink to count as an output row; also the width of the output table (as a multiple of the width of the input table)
        fixActs - actions to take for deduplicating column names (width-element array, or null)
      • findGroups

        public static java.util.Map findGroups​(LinkSet links)
        Returns a mapping from RowLinks to LinkGroups which describes connected groups of links in the input LinkSet. A related group is one in which the RowRefs of its constituent RowLinks form a connected graph in which RowRefs are the nodes and RowLinks are the edges. A LinkGroup with a link count of more than one therefore represents an ambiguous match, that is one in which one or more of its RowRefs is contained in more than one RowLink in the original LinkSet.

        The returned map contains entries only for non-trivial LinkGroups, that is ones which contain more than one link.

        Parameters:
        links - link set representing a set of matches
        Returns:
        RowLink -> LinkGroup mapping describing connected groups in links