All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class web.WebPage

java.lang.Object
   |
   +----web.WebPage

public class WebPage
extends Object
A class containing a web page. Methods for convenient word-by-word access to its contents are supplied (HTML tags are skipped). Only text/plain and text/html data can be processed. Other data results in IOExceptions.

See Also:
SimpleUserAgent, PageItem

Method Index

 o advance()
Advance the internal pointer to the next item.
 o atEnd()
Is there an item left?
 o getCurrent()
Return current page item the internal pointer points to.
 o getHeader()
Get HTTP header of the server's reply.
 o getPageContent()
Return the whole page as one long string.
 o getTitle()
Return the title of the page.
 o reset()
Reset internal pointer to the beginning of the page.
 o toString()
Return the header + body as one large string.

Methods

 o reset
 public void reset()
Reset internal pointer to the beginning of the page.

See Also:
getCurrent, atEnd, advance
 o atEnd
 public boolean atEnd()
Is there an item left?

Returns:
true if and only if all items have been enumerated.
See Also:
reset, getCurrent, advance
 o getCurrent
 public PageItem getCurrent()
Return current page item the internal pointer points to. Return null if there is no item left.

Returns:
page item.
See Also:
reset, atEnd, advance
 o advance
 public void advance()
Advance the internal pointer to the next item. If there is no item left, nothing happens. As 'items' in this sense count only ordinary words or the link arguments of href. All special characters like \r\n\t \"=.,:;~`'!@#$%^*()[]{}|\\/^<> delimit words and therefore they are never contained in the WebItem returned by getCurrent. An exception is ';' if it marks the end of an encoded character like &uuml; and the page is in written in HTML.

See Also:
reset, atEnd, getCurrent
 o getTitle
 public String getTitle()
Return the title of the page. The title is taken from the <title> tag Return null if this page does not have got a title.

Returns:
title.
 o getPageContent
 public String getPageContent()
Return the whole page as one long string. This is meant for debugging purposes only or for getting robots.txt in order to parse it.

Returns:
page as multiline string.
 o getHeader
 public String getHeader()
Get HTTP header of the server's reply. This method is provided for curious students only. You do not have to use it nor do you have to understand its meaning.

See section 14 of RFC 2068 for a description of what you can contained in the header.

Returns:
a multiline string containing header keys and values.
 o toString
 public String toString()
Return the header + body as one large string.

Returns:
string.
Overrides:
toString in class Object

All Packages  Class Hierarchy  This Package  Previous  Next  Index