Class MatchStarTables
- java.lang.Object
-
- uk.ac.starlink.table.join.MatchStarTables
-
public class MatchStarTables extends java.lang.Object
Provides factory methods for producing tables which represent the result of row matching.- Author:
- Mark Taylor (Starlink)
-
-
Field Summary
Fields Modifier and Type Field Description static ValueInfo
GRP_ID_INFO
Defines the characteristics of a table column which represents the ID of a group of matched row objects.static ValueInfo
GRP_SIZE_INFO
Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
-
Constructor Summary
Constructors Constructor Description MatchStarTables()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.util.Map
findGroups(LinkSet links)
static StarTable
makeInternalMatchTable(int iTable, LinkSet rowLinks, long rowCount)
Analyses a set of RowLinks to mark as linked rows of a given table.static StarTable
makeJoinTable(StarTable[] tables, LinkSet rowLinks, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
Constructs a table made out of a set of constituent tables joined together according to aLinkSet
describing row matches.static StarTable
makeJoinTable(StarTable table1, StarTable table2, LinkSet pairs, JoinType joinType, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
static StarTable
makeParallelMatchTable(StarTable table, int iTable, LinkSet links, int width, int minSize, int maxSize, JoinFixAction[] fixActs)
Constructs a new wide table from a single given base table and a set of RowLinks.static StarTable
makeSequentialJoinTable(StarTable[] tables, LinkSet rowLinks, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a LinkSet.
-
-
-
Field Detail
-
GRP_ID_INFO
public static final ValueInfo GRP_ID_INFO
Defines the characteristics of a table column which represents the ID of a group of matched row objects.
-
GRP_SIZE_INFO
public static final ValueInfo GRP_SIZE_INFO
Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
-
-
Method Detail
-
makeJoinTable
public static StarTable makeJoinTable(StarTable table1, StarTable table2, LinkSet pairs, JoinType joinType, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
Constructs a table made out of two constituent tables joined together according to aLinkSet
describing row matches and a flag determining what conditions on aRowLink
give you an output row. The columns of the resulting table are made by appending the columns of the constituent tables side by side.The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.
The
matchScoreInfo
parameter is optional. If it is non-null, then an additional column, described bymatchScoreInfo
, will be added to the table containing thescore
values from anyRowLink2
s inlinks
. The content class ofmatchScoreInfo
should beNumber
or one of its subclasses.This is a convenience method which calls the other
makeJoinTable
method.- Parameters:
table1
- first input tabletable2
- second input tablepairs
- set of links each representing a matched pair of rows betweentable1
andtable2
. Contents of this set may be modified by this routinejoinType
- describes how the input list of matched pairs is used to generate an output sequence of rowsaddGroups
- flag which indicates whether the output table should, if appropriate, includeGRP_ID_INFO
andGRP_SIZE_INFO
columnsfixActs
- actions to take for deduplicating column names (array of the same length as tables)matchScoreInfo
- may supply information about the meaning of the match scores- Returns:
- table representing the join
-
makeJoinTable
public static StarTable makeJoinTable(StarTable[] tables, LinkSet rowLinks, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
Constructs a table made out of a set of constituent tables joined together according to aLinkSet
describing row matches. The columns of the resulting table are made by appending the columns of the constituent tables side by side. Each row in the resulting table corresponds to oneRowLink
entry in a set rowLinks; if that RowLink contains a row from one of the tables being joined here, the columns corresponding to that table are filled in. If it contains multiple rows from that table, an arbitrary one of them is filled in.The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.
The
matchScoreInfo
parameter is optional. If it is non-null, then an additional column, described bymatchScoreInfo
, will be added to the table containing thescore
values from theRowLink
s inlinks
. The content class ofmatchScoreInfo
should beNumber
or one of its subclasses.- Parameters:
tables
- array of constituent tablesrowLinks
- set of RowLink objects which define which rows in one table are associated with which rows in the othersaddGroups
- flag which indicates whether the output table should, if appropriate, includeGRP_ID_INFO
andGRP_SIZE_INFO
columnsfixActs
- actions to take for deduplicating column names (array of the same length as tables)matchScoreInfo
- may supply information about the meaning of the link scores
-
makeSequentialJoinTable
public static StarTable makeSequentialJoinTable(StarTable[] tables, LinkSet rowLinks, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a LinkSet. Any input tables which do not have random access must have row ordering consistent with (that is, monotonically increasing for) the ordering of the links in the LinkSet. In practice, this is only likely to be the case if all the input tables are random access except for (at most) one, and the links are ordered with reference to that one. If this requirement is not met, sequential access to the resulting table is likely to fail at some point.- Parameters:
tables
- array of constituent tablesrowLinks
- link set defining the matchfixActs
- actions to take for deduplicating column names (array of the same size astables
)matchScoreInfo
- may suply information about the meaning of the match scores, if present
-
makeInternalMatchTable
public static StarTable makeInternalMatchTable(int iTable, LinkSet rowLinks, long rowCount)
Analyses a set of RowLinks to mark as linked rows of a given table. The result of this method is a two-column table whose rows correspond one-to-one with the rows of the table referenced in the link set. The output columns are defined by the constantsGRP_ID_INFO
andGRP_SIZE_INFO
. Rows of the table linked together by rowLinks are assigned the same integer value in the new GRP_ID_INFO column, and the GRP_SIZE_INFO column indicates how many rows are linked together in this way. Each group corresponds to a single RowLink; if a row is part of more than one RowLink then only one of them will be recorded in the new columns. Any rows linked in rowLinks which do not refer to table have null entries in these columns.- Parameters:
iTable
- the index of the table in which internal matches are to be soughtrowLinks
- a collection ofRowLink
objects linking groups of rows togetherrowCount
- number of rows in the returned table (must be large enough to accommodate the indices in rowLinks)- Returns:
- a new two-column table with a one-to-one row correspondance with the table describing internal row matches
-
makeParallelMatchTable
public static StarTable makeParallelMatchTable(StarTable table, int iTable, LinkSet links, int width, int minSize, int maxSize, JoinFixAction[] fixActs)
Constructs a new wide table from a single given base table and a set of RowLinks. The resulting table consists of a number of sections of the original table placed side by side, so it has width times the number of columns that table does. Each row is constructed from one or more rows of the original table; each output row corresponds to a single RowLink. Only row links which have at least minSize entries and no more than maxSize entries are converted into output rows; if there are more entries than the width of the table the extras are just discarded. Any row references in a RowLink not corresponding to table index iTable are ignored.- Parameters:
table
- input tableiTable
- index corresponding to this table in the rowLinks setlinks
- collection ofRowLink
objects describing the matches. This collection is modified on exitwidth
- width of the output table as a multiple of the width of the input tableminSize
- minimum number of entries in a RowLink to count as an output rowmaxSize
- maximum number of entries in a RowLink to count as an output row; also the width of the output table (as a multiple of the width of the input table)fixActs
- actions to take for deduplicating column names (width-element array, or null)
-
findGroups
public static java.util.Map findGroups(LinkSet links)
Returns a mapping fromRowLink
s toLinkGroup
s which describes connected groups of links in the input LinkSet. A related group is one in which the RowRefs of its constituent RowLinks form a connected graph in which RowRefs are the nodes and RowLinks are the edges. A LinkGroup with a link count of more than one therefore represents an ambiguous match, that is one in which one or more of its RowRefs is contained in more than one RowLink in the original LinkSet.The returned map contains entries only for non-trivial LinkGroups, that is ones which contain more than one link.
- Parameters:
links
- link set representing a set of matches- Returns:
- RowLink -> LinkGroup mapping describing connected groups
in
links
-
-