| |||||||||||||
![]() |
| Theseus |
| Introduction |
|
Theseus is a Java framework that aims to ease development of web applications using the Java Servlet API and the popular MVC (Model/View/Controller) architecture. It is especially suited to providing a more manageable division of the web application presentation tier. Theseus will allow your team to divide up the common tasks of business logic, presentation logic, and the actual layout of the presentation itself. |
Theseus offers these features: |
Whats with the name anyways |
|
The name 'Theseus' does have a story behind it. In Greek Mythology, Theseus had many different adventures, but his brush with the Minotaur is pretty classic, if you'll pardon the pun. Theseus was the prince of Athens, and Athens at the time was paying a tax to Minos of 7 maids and 7 youths each year. These were fed to the Minotaur, which resided in an inescapable maze (labyrinth) built by Daedalus (the guy who's son tried to fly to the sun). Minos' daughter fell in love with Theseus when he went as one of the 7 youths (intending to kill the Minotaur or Minos). She gave him a sword and a ball of string. He found and killed the Minotaur, and retraced the path he'd laid with the string and thus escaped. It could be said to be a story of how a simple tool was used to vanquish a great beast. Considering that web application development is usually a maze of various APIs and solutions, and contains a monster to defeat (maintenance, load-balancing, fail-over, scalability to name a few), the name Theseus fits well for the task that this framework accomplishes. |
MVC |
|
MVC stands for Model/View/Controller. In short it is one of many development patterns that can be used to create applications. The MVC pattern is widely used in Java SWING components, and has been very popular with web application development as well. The MODEL represents an entity...meaning the actual "data" that would be stored in a database, for example, or placed in XML. The VIEW is the actual display the client sees in their web browser, but it also is the server side representation of how the VIEW is generated. The CONTROLLER handles all incoming requests and does the job of figuring out what business logic is performed and what VIEW to render. |
Clustering, Fail-over and Scalability |
|
Before we get into the framework and how to use it, we'll discuss some of the technology being used today and how this framework will help meet the criteria to solve the issues (hence part of the meaning of the name). When developing a web application solution, usually several questions are brought up that will require valid solutions to solve. Some of the more common questions are how to obtain 42/7/365 up-time (that means your site is up ALL the time year round without interruption); how to provide a flexible hardware/software solution that allows growth (also known as scalability); how to make sure if your site should go down the clients don't lose all their work (also known as fault-tolerance and fail-over); and making sure you have enough power to handle loads under peak periods (also known as load balancing). We will address these issues in this section and explain how they are achieved and how Theseus will help you work towards all of these goals. Lets touch on each area in depth. Uptime all the time Unless you are creating a site that does not need to be up all the time, you will probably want to figure out how you can make sure your site runs at all times, no matter what happens. Some common pitfalls include power failures, server shut-downs or hardware failures. There is never a 100% guarantee you can prevent any to all of these happening, but at least you can be prepared if any should occur. While there are many options for keeping your site running all the time, probably the best solution is to make sure your site (thus all your servers, database, switches, routers, etc) are located at a co-location facility. These places usually offer high-bandwidth access to the internet, cages to store your entire farm of equipment in, and most will offer some sort of uninterruptible power option such as diesel or gas powered generators that quick on as soon as power goes out. I will fore warn you however, they don't come cheap. I can not give accurate estimates, but probably in the range of several hundred to many tens of thousands per month depending on the number of cages used as well as some places charge for bandwidth use as well. However they do offer the best solution in making sure your site is up all the time. Most places offer some sort of monitoring of hardware as well. One step better is to co-lo your site in two or more locations. First, just in case one place becomes completely incompacitated, you still have your site located in another. Second, it is possible to allow regional access to each facility. This is especially important if you have a high-traffic site and have client access from all over the world. More will be said on load balancing later, but this is another option to distribute the load of your site. An alternative is to host everything at your own facility. While this can be cheaper than a co-location facility, you will still need to take measures in making sure you have everything on battery back-up devices, and possibly with generators that can kick in should power go out and remain off for a period too long for the backup systems to keep the site running. You will also need to wire in special lines such as T1, T3 or more depending on the bandwidth requirements of your site. It is recommended that a separate T1 (or faster) line be used for employees so as not to slow down bandwidth to your site. These days its quite common for employees to be downloading audio and/or video streams or other large files that can eat up bandwidth quickly. It may be necessary to monitor and/or restrict this sort of bandwidth use as well. Scalability, Load Balancing and Fault Tolerance If all is going well you will most likely need to handle more clients than originally estimated or planned for. It may be difficult to gauge when your site needs more power to handle peak loads, but it helps to prepare before hand. Many tools exist on the market today to help load test your site with any number of virtual users. Even so, you may get an unexpected rapid growth for a short period of time and you wont want your site to crumble at such times. Being prepared means having a plan of attack to quickly add more bandwidth to your site. This isn't just a hardware solution either. You entire application needs to be written to handle this rapid growth and if its not planned for early on, you may well see your site crumble under load. Scalability is the ability to add servers or more cpus and memory to handle more clients at the same time. It ties in well with load balancing as well which is why both are discussed in this section. So how do you allow your site to be scalable? As stated above, you need to develop your application in the beginning, or retrofit your existing application to support multiple threads and work across multiple servers. Scalability requires both hardware and software to get the job done. First, the hardware. Part of scalability is being able to add another server (or more than one) to a farm of servers in order to handle more clients. For example, if you start your web site out with a single server and can handle one thousand clients at the same time, but during peak operation you may reach up to two thousand clients, it seems easy enough to add a second server to handle the other one thousand clients. But what is going to send requests to one server, and not the other? More so, what happens when a client establishes a session on one server (which is used to keep track of information as they navigate the site, such as a shopping cart site), and their next request goes to the second server instead of the first? The hardware that fixes this problem is called a Load Balancer. There are many flavors, as cheap as a few hundred dollars and as expensive as tens of thousands of dollars. Their speed, capability and robustness determines the price. A Load Balancer handles the task of sending requests to one or more servers, but more so it handles the "load" by keeping track of the number of requests currently at each server and doing its best to evenly distribute requests to each server in the farm. What makes some load balancers more valuable than others is those that can handle sessions (usually using cookie tracking). In the example above, if you submit a request and it goes to one server out of two, which establishes a session for you on that server, all of your subsequent requests need to go to that same server for you to be able to keep an ongoing session. With this type of process, each server can run independently, not needing to worry what the other server is doing. However, there is one problem with this solution. While you are now able to handle "theoretically" twice as many clients, if one of the two servers happens to fail (dies, craps out, you name it), ALL of the data in the memory of that server is lost! Any client with existing session information is now routed to the other server where a session does not exist for them..thus they have to begin all over. If your site is not written to handle this, often the client is returned to the main outside page (if your site requires a login to do specific transactions, for example), or some other place. What we want to do is handle the ability of what is called Session fail-over. If a session on one server is established and that server fails, we don't want the client to lose all their information. We need a way to copy all that session data for each client on that server, to another server. It just so happens that the J2EE specification includes such capability (although it is not yet mandated by J2EE vendors to have to support this feature). The trick is to write your application in such a way that all objects that are stored in the HttpSession implement the java.io.Serializable interface. When two servers are used and ALL objects in the HttpSession implement Serializable (including any object instance fields of objects), an application server can replicate the HttpSession data to the other server, using serilaization of the whole HttpSession. Now let me point out that this has nothing to do with hardware at this point. You generally would need identical application servers on two servers, and they both need to be aware of the other one, so that they can communicate with one another. This is always vendor specific on how they are setup to communicate with one another. Rest assured when this process is in use, AND you have a load balancer, you have achieved the goal of load-balanced and fault-tolerance (at the front-end application server level). For a complete solution, you still will need to have two of everything in terms of hardware. For example, you would need two load balancers, just in case one should stop working. You would need two servers, two database servers and so on. This is beyond the scope of this article on how these are set up, but in terms of fault-tolerance, it goes beyond the application server setup and HttpSession fail-over setup. By the way, the term often used for this type of setup is called "clustering". There is one drawback to Session fail-over..the memory requirements. Most application servers will use in-memory session fail-over for performance reasons. This means, if you have two servers in a farm, BOTH servers need double the memory in order to replicate each others sessions. Server-A replicated to Server-B, and vice versa. Some application servers may replicate to a local database, persisting the HttpSession, which would reduce the memory footprint, but decrease the performance (and thus the number of clients it can handle) of the site. If a farm has 10 servers, its possible each server would require 10 times the amount of memory for session replication. Also keep in mind the network traffic. If each server has 4GB of memory, session replication could possibly use that much bandwidth communicating with one another. Also keep in mind that each server replicates its session to all the other servers in the farm. So how do application servers handle this? Well, each is different. WebLogic uses a "buddy" system by which a Master server replicates to only one other application server, no matter how many are in the farm. Should a master go down, the buddy looks for another master, as does a master look for another buddy if the buddy it has goes down. Orion application server replicates to all servers in what they call an "island". However, it has a very nice capability of using multiple islands so that you can keep two servers per island with multiple islands. Ideally, using two servers reduces the amount of memory each server would need as well as reduces the network traffic between them, but it does allow for x number of islands each with x number of servers. Therefore, you can still scale very well by adding another island to the overall application cluster. |