Current Web Architecture
Thursday, November 27, 2008
This section of the Internet Tool Survey describes the current architecture of the World Wide Web (WWW). The NCSA Glossary is a useful starting point for Web terms. Another is the ILC glossary of Internet Terms.
The following sections describe
- the basic two-tier architecture of the web in which static web pages (documents) are transferred from information servers to browser clients world-wide,
- extensions that permit three-tiered architectures where content pages can be constructed dynamically and where programs as well as data can be transferred,
- other information transfer protocols, and
- related standards.
The basic web architecture is two-tiered and characterized by a web client that displays information content and a web server that transfers information to the client. This architecture depends on three key standards: HTML for encoding document content, URLs for naming remote information objects in a global namespace, and HTTP for staging the transfer.
- HyperText Markup Language (HTML) - the common representation language for hypertext documents on the Web. HTML had a first public release as HTML 0.0 in 1990, was Internet draft HTML 1.0 in 1993, and HTML 2.0 in 1994. The September 22 1995 draft of the HTML 2.0 specification has been approved as a standard by the IETF Application Area HTML Working Group. HTML 3.0 and Netscape HTML are competing next generations of HTML 2.0. Proposed features in HTML 3.0 include: forms, style sheets, mathematical markup, and text flow around figures. For more detailed information, see the HTML Reference Manual.
HTML is an application of the Standard Generalized Markup Language (SGML ISO-8879), an international standard approved in 1986, which specifies a formal meta-language for defining document markup systems (more here and here). An SGML Document Type Definition (DTD) specifies valid tag names and element attributes. HTML consists of embedded content separated by hierarchical case sensitive start and end tag names which may contain embedded element attributes in the start tag. These attributes may be required, optional, or empty. In addition, documents can be inter or intra linked by establishing source and target anchor points. Many HTML documents are the result of manual authoring or word processing HTML converters, but now several WYSIWYG editors support HTML styles -- see listing at W3C and the Internet Tools Survey section on Authoring HTML.
HTML files are viewed using a WWW client browser (software), the primary user interface to the Web. HTML allows for embedding of images, sounds, video streams, form fields and simple text formatting. References, called hyperlinks, to other objects are embedded using URLs (see below). When an object is selected by a hyperlink, the browser takes an action based on the URL's type, e.g., retrieve a file, connect to another Web site and display a HTML file stored there, or launch an application such as an E-mail or newsgroup reader.
- Universal Resource Identifier (URI) - an IETF addressing protocol for objects in the WWW ("if it's out there, we can point at it"). There are two types of URIs, Universal Resource Names (URN) and the Universal Resource Locators (URL). The current IETF URI spec is here and the URL spec is here.
URLs are location dependent and contain four distinct parts: the protocol type, the machine name, the directory path and the file name. There are several kinds of URLs: file URLs, FTP URLs, Gopher URLs, News URLs, and HTTP URLs. URLs may be relative to a directory or offsets into a document. Arguments to CGI programs (see below) may be embedded in URLs after the ? character.
- HyperText Transfer Protocol (HTTP) - an application-level network protocol for the WWW. Tim Berners-Lee, father of the Web, describes it as a "generic stateless object-oriented protocol." Stateless means neither the client nor the server store information about the state of the other side of an ongoing connection. Statelessness is a scalability property but is not necessarily efficient since HTTP sets up a new connection for each request, which is not desirable for situations requiring sessions or transactions.
- In HTTP, commands (request methods) can be associated with particular types of network objects (files, documents, network services). Commands are provided for
- establishing a TCP/IP connection to a WWW server,
- sending a request to the server (containing a method to be applied to a specific network object identified by the object's identifier, and the HTTP protocol version, followed by information encoded in a header style),
- returning a response from the server to the client (consisting of three parts: a status line, a response header, and response data), and
- closing the connection.
- HTTP supports dynamic data representation through client-server negotiation. The requesting client specifies it can accept certain MIME content types (more on this below) and the server responds with one of these. All WWW clients can handle text/plain and text/html.
- HTTP/1.0 Internet Draft 05 (the seventh release of HTTP/1.0) is targeted as an Internet Informational RFC. The next immediate version of HTTP is HTTP/1.1 Internet Draft 01.
Web Architecture Extensibility
- Common Gateway Interface(CGI) - CGI is a standard for interfacing external programs with Web servers (see Figure 1). The server hands client requests encoded in URLs to the appropriate registered CGI program, which executes and returns results encoded as MIME messages back to the server. CGI's openness avoids the need to extend HTTP. The most common CGI applications handle HTML
- CGI programs are executable programs that run on the Web server. They can be written in any scripting language (interpreted) or programming language (must be compiled first) available to be executed on a Web server, including C, C++, Fortran, PERL, TCL, Unix shells, Visual Basic, Applescript, and others. Security precautions typically require that CGI programs be run from a specified directory (e.g, /cgi-bin) under control of the webmaster (Web system administrator), that is, they must be registered with the system.
- Arguments to CGI programs are transmitted from client to server via environment variables encoded in URLs. The CGI program typically returns HTML pages that it constructs on the fly.
- Some problems with CGI are:
- the CGI interface requires the server to execute a program
- the CGI interface does not provide a way to share data and communications resources so if a program must access an external resource, it must open and close that resource. It is difficult to construct transactional interactions using CGI.
- The current version is CGI/1.1. W3C and others are experimenting with next generation object-oriented APIs based on OMG IDL; Netscape provides Netscape Server API (NSAPI) and Progress Software and Microsoft provide Internet Server API (ISAPI).
- Helpers/Plug-ins - When a client browser retrieves a file, it launches an installed helper application or plug-in to process the file based on the file's MIME-type (see below). For example, it may launch a Postscript or Acrobat reader, or MPEG or QuickTime player. A helper application runs external to the browser while a plug-in runs within the browser. For information on how to create new Netscape Navigator plug-ins, see The Plug-in Developer's Guide.
- Common Client Gateway (CCI) - this gateway allows a third-party application to remotely control the Web browser client. Netscape Client APIs 2.0 (NCAPIs) depends on platform specific native methods of interprocess communication (IPC). They plan to support DDE and OLE2 for Windows clients, X properties for UNIX clients, and Apple Events for Macintosh clients.
- Extensions to HTTP. W3C and IETF Application Area HTTP Working Group are working together on current and future versions of HTTP. The HTTP-NG project is assessing two implementation approaches to HTTP "replacements":
- Spero's approach - allows many requests per connection, the requests can be asynchronous and the server can respond in any order, allowing several transfers in parallel. A "session layer" divides the connection into numerous channels. Control messages (GET requests, meta information) are returned in a control channel; each object is returned in its own channel.
- W3C approach - Jim Gettys at W3C is using Xerox ILU (a CORBA variant) to implement an ILU transport similar to Spero's session protocol. The advantages of this approach are openness with respect to pluggable transport protocols, support for multiple language environments, and a step towards viewing the "web of objects." Related to this approach, Netscape recently announced future support for OMG Internet Inter-ORB Protocol (IIOP) standard on both client and server. This will provide a uniform and language neutral object interchange format making it easier to construct distributed object applications.
Posted bySumedh at 11:10 PM
Blogs and RSS
Wednesday, November 26, 2008
The Web is a welcoming medium for experimentation and user participation. It is becoming easier to post Web content and share comments with other users. The idea of the Web site is still very much alive, but Web participation is taking new forms and being driven by new technologies that foster social interaction. Here are two of the latest trends.
Blogs: A blog is an easy-to-create Web site that allows users to share their thoughts with the world managed by a lightweight content management system. The word "blog" comes from "Weblog" because a blog consists of a signed and dated log of individual postings. The topic of the blog can be anything, from the personal to the professional. A blog is what you make of it.
What is important about blogs is the content management system that manages the content. This system can offer a variety of features that can make the blog a useful tool. Examples include a calendar view of postings, organization of postings into categories, archived postings, options to send e-mail notification of new postings, and so on.
Blogging can be an interactive activity. Readers can add comments to a blogger's postings, other can respond, and a conversation ensues. Lately, bloggers have become well-known commentators on the political scene, but blogging can encompass any topic or no topic at all. If the blogging software allows it, bloggers can use RSS to distribute their postings.
RSS: RSS allows people to place news and other announcement-type items into a simple XML format that can then be pushed to RSS readers and Web pages. The initials RSS can stand for different things, including Rich Site Summary or Really Simple Syndication. Users can subscribe to the RSS newsfeeds of their choice, and then have access to the updated information as it comes in. RSS is used for all kinds of purposes, including the news itself and announcing new content on Web sites.
RSS content may be read by using an RSS reader, or aggregator. This is usually free software that you can install on your computer that posts new items and stores old ones in a graphical interface. An RSS reader similar to e-mail software in that it displays incoming items and can store content for offline reading. Subscribing to a newsfeed is usually as simple as entering the address of the RSS document.
A useful list of RSS readers is available on the site of RSS Compendium.
It is also possible to subscribe to and read your own collection of RSS feeds on Web sites devoted to this purpose. Bloglines is one such example. The advantage here is that you can access your RSS feeds from any computer that is connected to the Web.
Posted bySumedh at 11:08 PM
Web - Programming Languages and Functions
Tuesday, November 25, 2008
The use of existing and new programming languages have extended the capabilities of the Web. What follows is a basic guide to a group of the more common languages and functions in use on the Web today.
CGI, Active Server Pages: CGI (Common Gateway Interface) refers to a specification by which programs can communicate with a Web server. A CGI program, or script, is any program designed to accept and return data that conforms to the CGI specification. The program can be written in any programming language, including C, Perl, and Visual Basic Script. A common use for a CGI script is to process a form on a Web page. For example, you might fill out a form to order a book at Amazon. The script processes your information and sends it to Amazon to process your order.
Java/Java Applets: Java Java is an object-oriented programming language similar to C++. Developed by Sun Microsystems, the aim of Java is to create programs that will be platform independent. The Java motto is, "Write once, run anywhere." A perfect Java program should work equally well on a PC, Macintosh, Unix, and so on, without any additional programming. This goal has yet to be realized. Java can be used to write applications for both Web and non-Web use.
Web-based Java applications are usually in the form of Java applets. These are small Java programs called from an HTML page that can be downloaded from a Web server and run on a Java-compatible Web browser. A few examples include live newsfeeds, moving images with sound, calculators, charts and spreadsheets, and interactive visual displays. Java applets can tend to load slowly, but programming improvements should lead to a shortened loading time.
XML: XML (eXtensible Markup Language) is a mark-up language that enables Web designers to create their own customized tags to provide functionality not available with HTML alone. XML is a language of data structure and exchange, and allows developers to separate form from content. With XML, the same content can be formatted for multiple applications. In May 1999, the W3 Consortium announced that HTML 4.0 has been recast as an XML application called XHTML. This move is slowing having an impact on the future of both XML and HTML.
Text, audio and video communication can occur in real time on the Web. This capability allows people to conference and collaborate in real time. In general, the faster the Internet connection, the more successful the experience.
At its simplest, chat programs allow multiple users to type to each other in real time. Internet Relay Chat and America Online's Instant Messenger are prime examples of this type of program. The development of a messenging protocols is underway. Such a protocol would allow for the expansion of this capability throughout the Internet.
More enhanced real-time communication offers an audio and/or video component. CU-See Me is a sotware programs of this type. Even more elaborate are programs that allow for true real-time collaboration. Microsoft's NetMeeting and Netscape's Conference (available with Communicator) are good examples of this.
Featured collaboration tools include:
- audio: conduct a telephone conversation on the Web
- video: view your audience
- file transfer: send files back and forth among participants
- chat: type in real time
- whiteboard: draw, mark up, and save images on a shared window or board
- document/application sharing: view and use a program on another's desktop machine
- collaborative Web browsing: visit Web pages together
Currently no standard exists that will work among all conferencing programs.
Posted bySumedh at 11:07 PM
Monday, November 24, 2008
Today's World Wide Web presents an ever-diversified experience of multimedia, programming languages, and real-time communication. There is no question that it is a challenge to keep up with the rapid pace of developments. The following presents a brief description of some of the more important trends to watch.
The Web has become a broadcast medium. It is possible to listen to audio and video over the Web, both pre-recorded and live. For example, you can visit the sites of news organizations and view the same videos shown on the nightly news. Several plug-ins are available for viewing these videos.
At one time, the entire multimedia file had to be downloaded before viewing. Since these types of files tend to be quite large, download times can be lengthy. This problem has been answered by a revolutionary development in multimedia capability: streaming media. In this case, audio or video files are played as they are downloading, or streaming, into your computer. Only a small wait, called buffering, is necessary before the file begins to play.
The Windows Media Player, RealPlayer and QuickTime plug-ins play streaming audio and video files. Extensive files such as interviews, speeches, hearings televised video clips and music work very well with these players. They are also ideal for the broadcast of real-time events. These may include live radio and television broadcasts, concerts, Web-only broadcasts, and so on.
Shockwave and Flash are plug-ins that provide another multimedia experience. They offer the creation and implementation of an entire multimedia display combining graphics, animation and sound.
Sound files, including music, are also a part of the Web experience. Sound files may be incorporated into Web sites, and are also available for downloading independent of Web site visits. For example, try the search engine FindSounds.com. Sound files of many types are supported by the Web with the appropriate plug-ins. The MP3 file format, and the choice of supporting plug-ins, is one of the most popular music trends to sweep the Web.
MP3 files are also the source of podcasts. These are audio files distributed through RSS feeds. A good example of library podcasting can be found on the site of Dowling College, which distributes podcasts of interest to its user community. A variation on the podcast is the vodcast, which is a video file distributed through RSS. This type of multimedia broadcasting is up-and-coming on the Web. (More on RSS below.)
Live cams are another aspect of the multimedia experience available on the Web. Live cams are video cameras that send their data in real time to a Web server. These cams may appear in all kinds of locations, both serious and whimsical: an office, on top of a building, a scenic locale, a special event, and so on.
Posted bySumedh at 11:06 PM