Friday, October 29, 2010

HTTP 1.1 Status Code Definitions

This is a quick reference to Http/1.1 status code categories, excerpt from http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html, part of RFC2616

Informational 1xx (100-101)

This class of status code indicates a provisional response, consisting only of the Status-Line and optional headers, and is terminated by an empty line. There are no required headers for this class of status code. Since HTTP/1.0 did not define any 1xx status codes, servers MUST NOT send a 1xx response to an HTTP/1.0 client except under experimental conditions.

Successful 2xx (200-206)

This class of status code indicates that the client's request was successfully received, understood, and accepted.

Redirection 3xx (301-307)


This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. A client SHOULD detect infinite redirection loops, since such loops generate network traffic for each redirection.

Client Error 4xx (400-417)

The 4xx class of status code is intended for cases in which the client seems to have erred. Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. User agents SHOULD display any included entity to the user.

Server Error 5xx (500-505)

Response status codes beginning with the digit "5" indicate cases in which the server is aware that it has erred or is incapable of performing the request. Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. User agents SHOULD display any included entity to the user. These response codes are applicable to any request method.

Health insurance

It is kind of hard to understand all these medical plans without reading their statements carefully. This year, company is changing the plan offers again to save cost, so next year's per-pay-check contribution will increase about 40%. Given that, I have to read the flyers carefully and attended one session to understand them better before make elections within enrollment period.

As I said, it is kind of confusing to engineers with so many acronyms. However, 10 minutes explanation will work well. Here I try to write down my understanding as a memo for future reference. For medical insurance plan, we need understand 3 categories, namely health plan, health account, and health service provider.

Health plan:
  1. PPO – Preferred provider organization.  A health plan that uses network and out-of-network providers.  Examples are Choice Plus (UHC) and Open Access Plus (CIGNA).
  2. EPO – Exclusive provider organization.  A health plan that uses network only providers.  Examples are Choice (UHC) and Open Access (CIGNA).
  3. OOA – Out of area plan.  Medical coverage for employees outside major network and metropolitan areas.
  4. HPSP – Health Plus Savings Plan.  Usually tax qualified high deductible health plan.
  5. HMO – Health maintenance organization.  A legally qualified health care organization that provides medical services in a geographic area.  Examples are Kaiser and Harvard Pilgrim Healthcare.
Health account:
  1. FSA - Flexible Spending Account (employee owes, use it or lose it)
  2. LPFSA - Limited Purpose Flexible Spending Account (employee owes, limited for vision and dental)
  3. HSA - Health Savings Account (employee owes, carry over with possible interest rate)
  4. HIA - Health Incentive Account (company owes, a.k.a Health Reimbursement Account)
Health service provider:
  1. CIGNA
  2. UHC(UnitedHealthCare)
  3. Kaiser
  4. Harvard Pilgrim
Notes:
  1. Tax laws prohibit the rollover of HIA funds into an HSA.
  2. New medical reforms requires dependents up to 26 years old be covered in medical plan, disregarding employment/martial status.

High performance web site - reading notes (4)

11. Avoid Redirects
Redirects hurt performance
Response status code is 3xx for redirects (300-307, and 304 is for conditional Get)
Redirects delays html doc, CSS impacts rendering, JS impacts rendering and parallel download
Missing Trailing Slash
    Apache alias/mod_rewrite/DirectorySlash
    autoindexing
Connecting Web Sites
Tracking internal traffic - referer logging
Tracking outbound traffic - beacon (http request contains tracking info in the URL)
Prettier URL - Avoid redirect using Alias, mod_rewrite, DirectorySlash and directly linking

12. Remove duplicate scripts
Unnecessary HTTP requests
Wasted JS execution
Implement a script management module in templating system
Script has a getVersion() function



13. Configure (Avoid) ETags
Entity tags - a mechanism that web servers and browsers use to validate cached components
ETag is a string, must be quoted, introduced in Http 1.1
If-Non-Match takes precedence over If-Modified-Since
ETag is typically constructed using attributes that make them unique to a specific server hosting a Web site
Apache ETag uses inode-size-timestampe, and FileETag directive removes inode
IIS ETag uses Filetimestamp:ChangeNumber (# of configuration changes to IIS)

14. Make ajax cacheable
Web2.0, DHTML, Ajax
Yahoo Mail caches ajax result
Use packet sniffer to monitor active/passive Ajax requests

Tuesday, October 26, 2010

High performance web site - reading notes (3)

6. Put (java)scripts at the bottom
Parallel downloads
Limiting parallel downloads to two per hostname is a guideline, new browsers expand to 4 or more for HTTP/1.1
Scripts block download
Use deferred scripts (DEFER attribute indicates the script does not contain document.write)

7. Avoid CSS Expressions
CSS expressions are a powerful and dangerous way to set CSS properties dynamically
CSS expressions are evaluated more frequently than most people expect.
One-Time Expressions
Event handlers

8. Make javascript and CSS external
In raw terms, inline is faster, but we need consider three metrics (page views, empty cache vs.primed cache, and component reuse).
post-onload download (document onload event,firebug highlights DOMContentLoaded, load events)

9. Reduce DNS lookup
Reduce the number of unique hostnames reduces the number of DNS lookups
Reduce the number of unique hostnames reduce the amount of parallel downloading
Use keep-alive to reuse an existing connection by voiding TCP/IP overhead

10. Minify javascript
Use minification instead of obfuscation (due to bugs, maintenance, debugging etc concerns)
Minify javascript using JSMin or dojo compressor (shrinksafe)

High performance web site - reading notes (2)

1. Make Fewer HTTP Requests
Image maps
CSS Sprites
Inline images (data: URL scheme)
    e.g. <img alt="red star" src="data:image/gif;base64,THE-BASE64-DATA-OF-IMAGE">
Combined javascripts and stylesheets

2. Use a Content Delivery Network
Akamai
Mirror Image
Limelight
SAVVIS (specialized in video content delivery)
Use keynote.com or gomez.com to test geographic locations

3. Add an Expires header
Expires
Cache-Control (max-age) which take precedence over Expires
Apache mod_expires
Empty cache vs. primed cache
Last-Modified
revving filenames (add build version number), don't use query string

4. Gzip components
Accept-Encoding (Content-Encoding in response)
Image/PDF should not be gzipped (Gzip your scripts and stylesheets)
Gzip reduce by about 70%
Apache mod_gzip (mod_deflate)
Proxy caching uses Vary header (e.g. Vary: Accept-Encoding,User-Agent)
Update (5/17/2012)
Compress the Embedded OpenType font files used by Internet Explorer. EOT is a binary format, but it is not natively compressed
Compress favicon, while an image file, is not natively compressed

5. Put stylesheets at the top
Use Link instead of @import as @import rule causes unexpected ordering in how the components are downloaded
FOUC = Flash of unstyled content
Put stylesheets in the document HEAD using the LINK tag

High performance web site - reading notes (1)

Background:
Somehow I was assigned a new task to investigate front-end performance for an important project, and majorly about web site performance. This is a hot topic in Web2.0 era, and I did join the Velocity 2010 conference this June @Santa Clara. Most sessions were all about Web site performance, and behind the scene how to make web pages faster while dealing with HTML/JS/CSS/Images/Flash etc old friends. However, I have not worked on this layer for years, and almost forgot how to write CSS/JS efficiently, so need pick up quickly by reading.

Why High performance web site?
Two main reason: Steve is the author of YSlow and once was Chief Performance Yahoo! to lead a team focusing on yahoo performance (yahoo also published the best practices), and now he is with Google for performance. The book is well organized and easy to read and understand. - I am preparing to read his second book "Even faster Web site" now.

Top 14 rules:
There are many rules (best practices) regarding Web site performance from Yahoo, Google or other companies. But in this book, Steve listed top 14 rules and explained the ins and outs of these rules with examples and case study. I will not repeat his points word by word here, but as a reading notes, I will write down key take away from each rule. Therefore, the notes might not be complete sentence, or without context, or hard to fully understand. If you are interested, get one copy and read Steve's original words.
  1. Make few Http requests
  2. Use a CDN
  3. Add an Expires header
  4. Gzip components
  5. Put stylesheets at the top
  6. Put (java)scripts at the bottom
  7. Avoid CSS Expressions
  8. Make javascript and CSS external
  9. Reduce DNS lookup
  10. Minify javascript
  11. Avoid Redirects
  12. Remove duplicate scripts
  13. Configure (avoid) ETags
  14. Make ajax cacheable

Wednesday, October 20, 2010

NoClassDefFoundError

Introduction:
This is an error we sometimes encounter in test environment. From JDK description (since JDK1.0), the error is Thrown if the Java Virtual Machine or a ClassLoader instance tries to load in the definition of a class (as part of a normal method call or as part of creating a new instance using the new expression) and no definition of the class could be found. The searched-for class definition existed when the currently executing class was compiled, but the definition can no longer be found (runtime).

Here are some cases of this error:
Case 1:
java.lang.NoClassDefFoundError: org/apache/log4j/Category) (Caused by org.apache.commons.logging.LogConfigurationException: No suitable Log constructor [Ljava.lang.Class;@2c773f1 for org.apache.commons.logging.impl.Log4JLogger (Caused by java.lang.NoClassDefFoundError: org/apache/log4j/Category))

Possible root cause: WebLogic cache somehow missed the class definition but didn't try to load it again.

The fix: Need a refresh to reload classes.
1.       Delete all the dir which named .wlnotdelete under beahome/ user_projects(or sub-dir)
2.       Delete all the files under upload dir under beahome/ user_projects(or sub-dir)
3.       Restart WLS and redeploy the package

Case 2:
java.lang.NoClassDefFoundError: Could not initialize class com.company.webapp.module.search.Proxy
com.company.webapp.module.search.Searcher.FTSearch(Searcher.java:141)
com.company.webapp.module.search.FullTextSearch.execute(FullTextSearch.java:104)
com.company.webapp.module.Exec.execute_local(Exec.java:876)
Root cause: new instance has dependency on service manager for search engine endpoint look-up. When service manager is down, the constructor will throw exception. 
The fix is to provide default value or not throw exception in constructor.
Case 3:
java.lang.NoClassDefFoundError: Could not initialize class com.company.webapp.module.search.PathCache$PathCacheHolder com.company.webapp.module.search.PathCache.getInstance(PathCache.java:139)

Root cause: In preload servlet, the init() method needs to call service manager (another application on the same JVM) for component endpoint look-up, it will slow down tomcat open ports (because tomcat opens ports only after all servlet/listener from web apps are initialized, in this case, preload servlet is still initializing while waiting for service manager). 
The fix is to use another thread in init() method with 30 seconds wait-time to call service manager to speed up tomcat startup. With that, preload servlet will get initialized very fast, and tomcat will open ports after the application (with preload servlet) and service manager (with other lightweight servlet) are ready in the same tomcat.
Case 4:
javax.servlet.ServletException: Servlet execution threw an exception
com.company.MyFilter.doFilter(MyFilter.java:30)
com.company.UtilFilter.doFilter(UtilFilter.java:64)
java.lang.NoClassDefFoundError: Could not initialize class

Possible root cause: Emma build somehow causes runtime issue though the compiling was ok. We once met similar issue (Emma build caused securityexception). The reason was there was no any change to application filters between 2 daily builds, so we were confident the error was not caused by code change, but by daily build. 
The fix was to use previous build or use non-Emma build.
To sum up:
NoClassDefFoundError is error and usually is caused by coding practice. However, sometime environment issue (like build or application server bug) might also cause this error. The way to fix it is to analyze the error log for root cause, then figure out a solution. One guideline is to ensure the jar or class is in the classpath, and the constructor (class) or init (servlet) or getInstance (for singleton class) doesn't depend on other resources.





Tuesday, October 5, 2010

Groupon and the business model

Groupon www.groupon.com- enabling a group of people to purchase a deal online as a group, with coupon sharing and purchasing features, categorized by city for better buyer gathering. There are many similar sites for promotions and sharing the similar business model. Merchandisers pay service fees (or commission) to groupon, and groupon will provide the daily deal news and sell the coupons, the buyer redeem the coupons (via printing or mobile) at merchandiser store. This is a win-win strategy for both seller and buyer, and groupon makes profits via commission (or sth like that), which is very a good business model - very similar to real world marketplace which provides the platform/location for both seller and buyer, and itself gets the rent. There are more and more features can be added to groupon site, but the essential value for seller and buyer is win-win situation. Sellers can sell more and advertise themselves with promotions, and buyers can get overwhelming deals in local city stores.

Similar Sites:
http://tippr.com/
http://livingsocial.com

Traditional Coupon Site:
http://dealsea.com/  - guess it profits from link, similar to Google Adsense
http://www.8coupons.com
http://www.couponsherpa.com/ - this site is well-organized, and easy to find coupons for big merchandisers

Notes from Java security training

Threat Modeling:
Use Microsoft SDL Threat Modeling Tool to understand the system potential threats. Usually we have 4 steps, namely (1) Draw Diagrams (2) Analyze Model (3) Describe Environment (4) Generate Reports. We can focus on outfacing interface then features, and define process, data store, external interactor and data flow as well as (trust) boundary.

Here is one *.tms file snapshot:



Secure Development Life-cycle:
Organization or project group should define some secure development process to get security in the whole software development phase, from requirement analysis, design, development, deployment and so on.

Use JTest to fix insecure code:
Eclipse with JTest plugin provides better experience to help write secure code.





OWASP WebGoat Example:
http://localhost:6080/WebGoat-5.1/attack



Notes:
  1. SDL Threat Modeling Tool needs Visio
  2. JTest is from Parasoft, which provides a bunch of features including security scan