Class CombinedMatchEngine
- java.lang.Object
-
- uk.ac.starlink.table.join.CombinedMatchEngine
-
- All Implemented Interfaces:
MatchEngine
public class CombinedMatchEngine extends java.lang.Object implements MatchEngine
A matching engine which provides matching facilities by combining the characteristics of a number of other matching engines. Because of the way it calculates bins (effectively multiplying one bin array by another), it is a good idea for efficiency's sake to keep down the number of bins returned by theMatchEngine.getBins(java.lang.Object[])
method of the component match engines.The match score is formed by taking the scaled match scores of the constituent engines and adding them in quadrature (if no scaling is available, unscaled values are used). Versions of this class before 2017 did not do that, it just added unscaled match scores together, which doesn't make much sense.
- Author:
- Mark Taylor (Starlink)
-
-
Field Summary
-
Fields inherited from interface uk.ac.starlink.table.join.MatchEngine
NO_BINS
-
-
Constructor Summary
Constructors Constructor Description CombinedMatchEngine(MatchEngine[] engines)
Constructs a new MatchEngine based on a sequence of others.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
canBoundMatch()
Indicates that theMatchEngine.getMatchBounds(uk.ac.starlink.table.join.NdRange[], int)
method can be invoked to provide some sort of useful result.java.lang.Object[]
getBins(java.lang.Object[] tuple)
Returns a set of keys for bins into which possible matches for a given tuple might fall.NdRange
getMatchBounds(NdRange[] inRanges, int index)
Given a range of tuple values, returns a range outside which no match to anything within that range can result.DescribedValue[]
getMatchParameters()
Returns a set of DescribedValue objects whose values can be modified to modify the matching criteria.ValueInfo
getMatchScoreInfo()
Returns a description of the value returned by theMatchEngine.matchScore(java.lang.Object[], java.lang.Object[])
method.double
getScoreScale()
Returns the square root of the number of constituent matchers if they all have definite score scaling values.DescribedValue[]
getTuningParameters()
Returns a set of DescribedValue objects whose values can be modified to tune the performance of the match.ValueInfo[]
getTupleInfos()
Returns a set of ValueInfo objects indicating what is required for the elements of each tuple.double
matchScore(java.lang.Object[] tuple1, java.lang.Object[] tuple2)
Indicates whether two tuples count as matching each other, and if so how closely.void
setName(java.lang.String name)
java.lang.String
toString()
-
-
-
Constructor Detail
-
CombinedMatchEngine
public CombinedMatchEngine(MatchEngine[] engines)
Constructs a new MatchEngine based on a sequence of others. The tuples accepted by this engine are composed of the tuples of its constituent engines (as specified by engines) concatenated in sequence.- Parameters:
engines
- match engine sequence to be combined
-
-
Method Detail
-
matchScore
public double matchScore(java.lang.Object[] tuple1, java.lang.Object[] tuple2)
Description copied from interface:MatchEngine
Indicates whether two tuples count as matching each other, and if so how closely. If tuple1 and tuple2 are considered as a matching pair, then a non-negative value should be returned indicating how close the match is - the higher the number the worse the match, and a return value of zero indicates a 'perfect' match. If the two tuples do not consitute a matching pair, then a negative number (conventionally -1.0) should be returned. This return value can be thought of as (and will often correspond physically with) the distance in some real or notional space between the points represented by the two submitted tuples.If there's no reason to do otherwise, the range 0..1 is recommended for successul matches. However, if the result has some sort of physical meaning (such as a distance in real space) that may be used instead.
- Specified by:
matchScore
in interfaceMatchEngine
- Parameters:
tuple1
- one tupletuple2
- the other tuple- Returns:
- 'distance' between tuple1 and tuple2; 0 is a perfect match, larger values indicate worse matches, negative values indicate no match
-
getScoreScale
public double getScoreScale()
Returns the square root of the number of constituent matchers if they all have definite score scaling values. Otherwise, returns NaN.- Specified by:
getScoreScale
in interfaceMatchEngine
- Returns:
- scale of successful match scores, a positive finite number or NaN
-
getMatchScoreInfo
public ValueInfo getMatchScoreInfo()
Description copied from interface:MatchEngine
Returns a description of the value returned by theMatchEngine.matchScore(java.lang.Object[], java.lang.Object[])
method. The content class should be numeric (though need not beDouble
), and the name, description and units should be descriptive of whatever the physical significance of the value is. If the result ofmatchScore
is not interesting (for instance, if it's always either 0 or -1),null
may be returned.- Specified by:
getMatchScoreInfo
in interfaceMatchEngine
- Returns:
- metadata for the match score results
-
getBins
public java.lang.Object[] getBins(java.lang.Object[] tuple)
Description copied from interface:MatchEngine
Returns a set of keys for bins into which possible matches for a given tuple might fall. The returned objects can be anything, but should have their equals and hashCode methods implemented properly for comparison.- Specified by:
getBins
in interfaceMatchEngine
- Parameters:
tuple
- tuple- Returns:
- set of bin keys which might be returned by invoking this method on other tuples which count as matches for the submitted tuple
-
getTupleInfos
public ValueInfo[] getTupleInfos()
Description copied from interface:MatchEngine
Returns a set of ValueInfo objects indicating what is required for the elements of each tuple. The length of this array is the number of elements in the tuple. Each element should at least have a defined name and content class. The info's nullable attribute has a special meaning: if true it means that it makes sense for this element of the tuple to be always blank (for instance assigned to no column).- Specified by:
getTupleInfos
in interfaceMatchEngine
- Returns:
- array of objects describing the requirements on each element of the tuples used for matching
-
getMatchParameters
public DescribedValue[] getMatchParameters()
Description copied from interface:MatchEngine
Returns a set of DescribedValue objects whose values can be modified to modify the matching criteria. Typically at least one of these will be some sort of tolerance separation which determines how close tuples must be to count as a match. This match engine's behaviour can be modified by callingDescribedValue.setValue(java.lang.Object)
on the returned objects.- Specified by:
getMatchParameters
in interfaceMatchEngine
- Returns:
- array of described values which influence the match
-
getTuningParameters
public DescribedValue[] getTuningParameters()
Description copied from interface:MatchEngine
Returns a set of DescribedValue objects whose values can be modified to tune the performance of the match. This match engine's performance can be influenced by callingDescribedValue.setValue(java.lang.Object)
on the returned objects.Changing these values will make no difference to the output of
MatchEngine.matchScore(java.lang.Object[], java.lang.Object[])
, but may change the output ofMatchEngine.getBins(java.lang.Object[])
. This may change the CPU and memory requirements of the match, but will not change the result. The default value should be something sensible, so that setting the value of these parameters is not in general required.- Specified by:
getTuningParameters
in interfaceMatchEngine
- Returns:
- array of described values which may influence match performance
-
canBoundMatch
public boolean canBoundMatch()
Description copied from interface:MatchEngine
Indicates that theMatchEngine.getMatchBounds(uk.ac.starlink.table.join.NdRange[], int)
method can be invoked to provide some sort of useful result.- Specified by:
canBoundMatch
in interfaceMatchEngine
- Returns:
- true iff getMatchBounds may provide useful information
-
getMatchBounds
public NdRange getMatchBounds(NdRange[] inRanges, int index)
Description copied from interface:MatchEngine
Given a range of tuple values, returns a range outside which no match to anything within that range can result. If the tuples on which this engine works represent some kind of space, the input values and output values specify a hyper-rectangular region of this space. In the common case in which the match criteria are based on proximity in this space up to a certain error, this method should return a rectangle which is like the input one but broadened in each direction by an amount corresponding to the error.Both the input and output rectangles are specified by tuples representing its opposite corners; equivalently, they are the minimum and maximum values of each tuple element. In either the input or output min/max tuples, any element may be null to indicate that no information is available on the bounds of that tuple element (coordinate).
An array of n-dimensional ranges is given, though only one of them (specified by the
index
value) forms the basis for the output range. The other ranges in the input array may in some cases be needed as context in order to do the calculation. If the match error is fixed, only the single input n-d range is needed to work out the single output range. However, if the errors are obtained by looking at the tuples themselves (match errors are per-row) then in general the broadening has to be done using the maximum error of any of the tables involved in the match, not just the one to be broadened. For a long time, I didn't realise this, so versions of this software up to STIL v3.0-14 (Oct 2015) were not correctly broadening these ranges, leading to potentially missed associations near the edge of bounded regions.This method can be used by match algorithms which know in advance the range of coordinates they will match against and wish to reduce workload by not attempting matches which are bound to fail.
For example, a 1-d Cartesian match engine with an isotropic match error 0.5 would turn input values of ((0,200),(10,210)) into output values ((-0.5,199.5),(10.5,210.5)).
This method will only be called if
MatchEngine.canBoundMatch()
returns true. Thus engines that cannot provide any useful information along these lines (for instance because none of its tuple elements isComparable
) do not need to implement it in a meaningful way.- Specified by:
getMatchBounds
in interfaceMatchEngine
- Parameters:
inRanges
- array of input ranges for the tables on which the match will take place; each element bounds the values for each tuple element in its corresponding table in a possible match (to put it another way - each element gives the coordinates of the opposite corners of a tuple-space rectangle covered by one input table)index
- which element of theinRanges
array for which the broadened output value is required- Returns:
- output range, effectively
inRanges[index]
broadened by errors - See Also:
MatchEngine.canBoundMatch()
-
setName
public void setName(java.lang.String name)
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-