All Packages Class Hierarchy This Package Previous Next Index
Class web.WebPage
java.lang.Object
|
+----web.WebPage
- public class WebPage
- extends Object
A class containing a web page.
Methods for convenient word-by-word access to its contents
are supplied (HTML tags are skipped).
Only text/plain
and text/html
data can be
processed. Other data results in IOExceptions.
- See Also:
- SimpleUserAgent, PageItem
-
advance()
- Advance the internal pointer to the next item.
-
atEnd()
- Is there an item left?
-
getCurrent()
- Return current page item the internal pointer points to.
-
getHeader()
- Get HTTP header of the server's reply.
-
getPageContent()
- Return the whole page as one long string.
-
getTitle()
- Return the title of the page.
-
reset()
- Reset internal pointer to the beginning of the page.
-
toString()
- Return the header + body as one large string.
reset
public void reset()
- Reset internal pointer to the beginning of the page.
- See Also:
- getCurrent, atEnd, advance
atEnd
public boolean atEnd()
- Is there an item left?
- Returns:
- true if and only if all items have been enumerated.
- See Also:
- reset, getCurrent, advance
getCurrent
public PageItem getCurrent()
- Return current page item the internal pointer points to.
Return
null
if there is no item left.
- Returns:
- page item.
- See Also:
- reset, atEnd, advance
advance
public void advance()
- Advance the internal pointer to the next item.
If there is no item left, nothing happens.
As 'items' in this sense count only ordinary words or
the link arguments of
href
.
All special characters like
\r\n\t \"=.,:;~`'!@#$%^*()[]{}|\\/^<>
delimit words and therefore they are never
contained in the WebItem
returned by getCurrent
.
An exception is ';
' if it marks the end of an
encoded character like ü
and the page
is in written in HTML.
- See Also:
- reset, atEnd, getCurrent
getTitle
public String getTitle()
- Return the title of the page.
The title is taken from the
<title>
tag
Return null
if this page does not have got a title.
- Returns:
- title.
getPageContent
public String getPageContent()
- Return the whole page as one long string.
This is meant for debugging purposes only or for
getting
robots.txt
in order to parse it.
- Returns:
- page as multiline string.
getHeader
public String getHeader()
- Get HTTP header of the server's reply. This method is provided
for curious students only. You do not have to use it nor do you
have to understand its meaning.
See section 14 of RFC 2068
for a description of what you can contained in the header.
- Returns:
- a multiline string containing header keys and values.
toString
public String toString()
- Return the header + body as one large string.
- Returns:
- string.
- Overrides:
- toString in class Object
All Packages Class Hierarchy This Package Previous Next Index