Package org.htmlparser.nodes
Class TagNode
- java.lang.Object
-
- org.htmlparser.nodes.AbstractNode
-
- org.htmlparser.nodes.TagNode
-
- Direct Known Subclasses:
BaseHrefTag
,CompositeTag
,DoctypeTag
,FrameTag
,ImageTag
,InputTag
,JspTag
,MetaTag
,ProcessingInstructionTag
public class TagNode extends AbstractNode implements Tag
TagNode represents a generic tag. If no scanner is registered for a given tag name, this is what you get. This is also the base class for all tags created by the parser.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.util.Hashtable
breakTags
Set of tags that breaks the flow.protected java.util.Vector
mAttributes
The tag attributes.protected static Scanner
mDefaultScanner
The default scanner for non-composite tags.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
accept(NodeVisitor visitor)
Default tag visiting code.boolean
breaksFlow()
Determines if the given tag breaks the flow of text.java.lang.String
getAttribute(java.lang.String name)
Returns the value of an attribute.Attribute
getAttributeEx(java.lang.String name)
Returns the attribute with the given name.java.util.Vector
getAttributesEx()
Gets the attributes in the tag.java.lang.String[]
getEnders()
Return the set of tag names that cause this tag to finish.int
getEndingLineNumber()
Get the line number where this tag ends.Tag
getEndTag()
Get the end tag for this (composite) tag.java.lang.String[]
getEndTagEnders()
Return the set of end tag names that cause this tag to finish.java.lang.String[]
getIds()
Return the set of names handled by this tag.java.lang.String
getRawTagName()
Return the name of this tag.int
getStartingLineNumber()
Get the line number where this tag starts.int
getTagBegin()
Gets the nodeBegin.int
getTagEnd()
Gets the nodeEnd.java.lang.String
getTagName()
Return the name of this tag.java.lang.String
getText()
Return the text contained in this tag.Scanner
getThisScanner()
Return the scanner associated with this tag.boolean
isEmptyXmlTag()
Is this an empty xml tag of the form <tag/>.boolean
isEndTag()
Predicate to determine if this tag is an end tag (i.e.void
removeAttribute(java.lang.String key)
Remove the attribute with the given key, if it exists.void
setAttribute(java.lang.String key, java.lang.String value)
Set attribute with given key, value pair.void
setAttribute(java.lang.String key, java.lang.String value, char quote)
Set attribute with given key, value pair where the value is quoted by quote.void
setAttribute(Attribute attribute)
Set an attribute.void
setAttributeEx(Attribute attribute)
Set an attribute.void
setAttributesEx(java.util.Vector attribs)
Sets the attributes.void
setEmptyXmlTag(boolean emptyXmlTag)
Set this tag to be an empty xml node, or not.void
setEndTag(Tag end)
Set the end tag for this (composite) tag.void
setTagBegin(int tagBegin)
Sets the nodeBegin.void
setTagEnd(int tagEnd)
Sets the nodeEnd.void
setTagName(java.lang.String name)
Set the name of this tag.void
setText(java.lang.String text)
Parses the given text to create the tag contents.void
setThisScanner(Scanner scanner)
Set the scanner associated with this tag.java.lang.String
toHtml(boolean verbatim)
Render the tag as HTML.java.lang.String
toPlainTextString()
Get the plain text from this node.java.lang.String
toString()
Print the contents of the tag.-
Methods inherited from class org.htmlparser.nodes.AbstractNode
clone, collectInto, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml
-
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.htmlparser.Node
clone, collectInto, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml
-
-
-
-
Field Detail
-
mDefaultScanner
protected static final Scanner mDefaultScanner
The default scanner for non-composite tags.
-
mAttributes
protected java.util.Vector mAttributes
The tag attributes. Objects of typeAttribute
. The first element is the tag name, subsequent elements being either whitespace or real attributes.
-
breakTags
protected static java.util.Hashtable breakTags
Set of tags that breaks the flow.
-
-
Constructor Detail
-
TagNode
public TagNode()
Create an empty tag.
-
TagNode
public TagNode(Page page, int start, int end, java.util.Vector attributes)
Create a tag with the location and attributes provided- Parameters:
page
- The page this tag was read from.start
- The starting offset of this node within the page.end
- The ending offset of this node within the page.attributes
- The list of attributes that were parsed in this tag.- See Also:
Attribute
-
TagNode
public TagNode(TagNode tag, TagScanner scanner)
Create a tag like the one provided.- Parameters:
tag
- The tag to emulate.scanner
- The scanner for this tag.
-
-
Method Detail
-
getAttribute
public java.lang.String getAttribute(java.lang.String name)
Returns the value of an attribute.- Specified by:
getAttribute
in interfaceTag
- Parameters:
name
- Name of attribute, case insensitive.- Returns:
- The value associated with the attribute or null if it does not exist, or is a stand-alone or
- See Also:
Tag.setAttribute(java.lang.String, java.lang.String)
-
setAttribute
public void setAttribute(java.lang.String key, java.lang.String value)
Set attribute with given key, value pair. Figures out a quote character to use if necessary.- Specified by:
setAttribute
in interfaceTag
- Parameters:
key
- The name of the attribute.value
- The value of the attribute.- See Also:
Tag.getAttribute(java.lang.String)
,Tag.setAttribute(String,String,char)
-
removeAttribute
public void removeAttribute(java.lang.String key)
Remove the attribute with the given key, if it exists.- Specified by:
removeAttribute
in interfaceTag
- Parameters:
key
- The name of the attribute.
-
setAttribute
public void setAttribute(java.lang.String key, java.lang.String value, char quote)
Set attribute with given key, value pair where the value is quoted by quote.- Specified by:
setAttribute
in interfaceTag
- Parameters:
key
- The name of the attribute.value
- The value of the attribute.quote
- The quote character to be used around value. If zero, it is an unquoted value.- See Also:
Tag.getAttribute(java.lang.String)
-
getAttributeEx
public Attribute getAttributeEx(java.lang.String name)
Returns the attribute with the given name.- Specified by:
getAttributeEx
in interfaceTag
- Parameters:
name
- Name of attribute, case insensitive.- Returns:
- The attribute or null if it does not exist.
- See Also:
Tag.setAttributeEx(org.htmlparser.Attribute)
-
setAttributeEx
public void setAttributeEx(Attribute attribute)
Set an attribute.- Specified by:
setAttributeEx
in interfaceTag
- Parameters:
attribute
- The attribute to set.- See Also:
setAttribute(Attribute)
-
setAttribute
public void setAttribute(Attribute attribute)
Set an attribute. This replaces an attribute of the same name. To set the zeroth attribute (the tag name), use setTagName().- Parameters:
attribute
- The attribute to set.
-
getAttributesEx
public java.util.Vector getAttributesEx()
Gets the attributes in the tag.- Specified by:
getAttributesEx
in interfaceTag
- Returns:
- Returns the list of
Attributes
in the tag. The first element is the tag name, subsequent elements being either whitespace or real attributes. - See Also:
Tag.setAttributesEx(java.util.Vector)
-
getTagName
public java.lang.String getTagName()
Return the name of this tag.Note: This value is converted to uppercase and does not begin with "/" if it is an end tag. Nor does it end with a slash in the case of an XML type tag. To get at the original text of the tag name use
getRawTagName()
. The conversion to uppercase is performed with an ENGLISH locale.- Specified by:
getTagName
in interfaceTag
- Returns:
- The tag name.
- See Also:
Tag.setTagName(java.lang.String)
-
getRawTagName
public java.lang.String getRawTagName()
Return the name of this tag.- Specified by:
getRawTagName
in interfaceTag
- Returns:
- The tag name or null if this tag contains nothing or only whitespace.
-
setTagName
public void setTagName(java.lang.String name)
Set the name of this tag. This creates or replaces the first attribute of the tag (the zeroth element of the attribute vector).- Specified by:
setTagName
in interfaceTag
- Parameters:
name
- The tag name.- See Also:
Tag.getTagName()
-
getText
public java.lang.String getText()
Return the text contained in this tag.- Specified by:
getText
in interfaceNode
- Overrides:
getText
in classAbstractNode
- Returns:
- The complete contents of the tag (within the angle brackets).
- See Also:
Node.setText(java.lang.String)
-
setAttributesEx
public void setAttributesEx(java.util.Vector attribs)
Sets the attributes. NOTE: Values of the extended hashtable are two element arrays of String, with the first element being the original name (not uppercased), and the second element being the value.- Specified by:
setAttributesEx
in interfaceTag
- Parameters:
attribs
- The attribute collection to set.- See Also:
Tag.getAttributesEx()
-
setTagBegin
public void setTagBegin(int tagBegin)
Sets the nodeBegin.- Parameters:
tagBegin
- The nodeBegin to set
-
getTagBegin
public int getTagBegin()
Gets the nodeBegin.- Returns:
- The nodeBegin value.
-
setTagEnd
public void setTagEnd(int tagEnd)
Sets the nodeEnd.- Parameters:
tagEnd
- The nodeEnd to set
-
getTagEnd
public int getTagEnd()
Gets the nodeEnd.- Returns:
- The nodeEnd value.
-
setText
public void setText(java.lang.String text)
Parses the given text to create the tag contents.- Specified by:
setText
in interfaceNode
- Overrides:
setText
in classAbstractNode
- Parameters:
text
- A string of the form <TAGNAME xx="yy">.- See Also:
Node.getText()
-
toPlainTextString
public java.lang.String toPlainTextString()
Get the plain text from this node.- Specified by:
toPlainTextString
in interfaceNode
- Specified by:
toPlainTextString
in classAbstractNode
- Returns:
- An empty string (tag contents do not display in a browser).
If you want this tags HTML equivalent, use
toHtml()
.
-
toHtml
public java.lang.String toHtml(boolean verbatim)
Render the tag as HTML. A call to a tag'stoHtml()
method will render it in HTML.- Specified by:
toHtml
in interfaceNode
- Specified by:
toHtml
in classAbstractNode
- Parameters:
verbatim
- Iftrue
return as close to the original page text as possible.- Returns:
- The tag as an HTML fragment.
- See Also:
Node.toHtml()
-
toString
public java.lang.String toString()
Print the contents of the tag.- Specified by:
toString
in interfaceNode
- Specified by:
toString
in classAbstractNode
- Returns:
- An string describing the tag. For text that looks like HTML use #toHtml().
-
breaksFlow
public boolean breaksFlow()
Determines if the given tag breaks the flow of text.- Specified by:
breaksFlow
in interfaceTag
- Returns:
true
if following text would start on a new line,false
otherwise.
-
accept
public void accept(NodeVisitor visitor)
Default tag visiting code. Based onisEndTag()
, calls eithervisitTag()
orvisitEndTag()
.- Specified by:
accept
in interfaceNode
- Specified by:
accept
in classAbstractNode
- Parameters:
visitor
- The visitor that is visiting this node.
-
isEmptyXmlTag
public boolean isEmptyXmlTag()
Is this an empty xml tag of the form <tag/>.- Specified by:
isEmptyXmlTag
in interfaceTag
- Returns:
- true if the last character of the last attribute is a '/'.
-
setEmptyXmlTag
public void setEmptyXmlTag(boolean emptyXmlTag)
Set this tag to be an empty xml node, or not. Adds or removes an ending slash on the tag.- Specified by:
setEmptyXmlTag
in interfaceTag
- Parameters:
emptyXmlTag
- If true, ensures there is an ending slash in the node, i.e. <tag/>, otherwise removes it.
-
isEndTag
public boolean isEndTag()
Predicate to determine if this tag is an end tag (i.e. </HTML>).
-
getStartingLineNumber
public int getStartingLineNumber()
Get the line number where this tag starts.- Specified by:
getStartingLineNumber
in interfaceTag
- Returns:
- The (zero based) line number in the page where this tag starts.
-
getEndingLineNumber
public int getEndingLineNumber()
Get the line number where this tag ends.- Specified by:
getEndingLineNumber
in interfaceTag
- Returns:
- The (zero based) line number in the page where this tag ends.
-
getIds
public java.lang.String[] getIds()
Return the set of names handled by this tag. Since this a a generic tag, it has no ids.
-
getEnders
public java.lang.String[] getEnders()
Return the set of tag names that cause this tag to finish. These are the normal (non end tags) that if encountered while scanning (a composite tag) will cause the generation of a virtual tag. Since this a a non-composite tag, the default is no enders.
-
getEndTagEnders
public java.lang.String[] getEndTagEnders()
Return the set of end tag names that cause this tag to finish. These are the end tags that if encountered while scanning (a composite tag) will cause the generation of a virtual tag. Since this a a non-composite tag, it has no end tag enders.- Specified by:
getEndTagEnders
in interfaceTag
- Returns:
- The names of following end tags that stop further scanning.
-
getThisScanner
public Scanner getThisScanner()
Return the scanner associated with this tag.- Specified by:
getThisScanner
in interfaceTag
- Returns:
- The scanner associated with this tag.
- See Also:
Tag.setThisScanner(org.htmlparser.scanners.Scanner)
-
setThisScanner
public void setThisScanner(Scanner scanner)
Set the scanner associated with this tag.- Specified by:
setThisScanner
in interfaceTag
- Parameters:
scanner
- The scanner for this tag.- See Also:
Tag.getThisScanner()
-
getEndTag
public Tag getEndTag()
Get the end tag for this (composite) tag. For a non-composite tag this always returnsnull
.- Specified by:
getEndTag
in interfaceTag
- Returns:
- The tag that terminates this composite tag, i.e. </HTML>.
- See Also:
Tag.setEndTag(org.htmlparser.Tag)
-
setEndTag
public void setEndTag(Tag end)
Set the end tag for this (composite) tag. For a non-composite tag this is a no-op.- Specified by:
setEndTag
in interfaceTag
- Parameters:
end
- The tag that terminates this composite tag, i.e. </HTML>.- See Also:
Tag.getEndTag()
-
-