Wednesday, December 7, 2011

CSS performance

Recently we are looking at Javascript performance, and there are many good resources (like High Performance Javascript book), when looking at CSS, my first thought is all about selector. However, after reading http://www.slideshare.net/booshtukka/high-performance-css slide, I realize that apart from selector/rendering performance, size/request numbers are still WPO fundamentals from CSS performance standpoint. Here are reading notes about "High Performance CSS" deck.

Size
name - using abbreviation
shortcuts (zeror no unit, decimal beginning with zero need no zero, shorthand)
no final semicolon
minify (use a build script to run YUI compressor)
run ImageOptim against images
Gzip CSS

Number of requests
Concatenate your files
Use CSS sprites for buttons/hover status

Rendering/selector efficiency
from right to left (make key selector efficient)
avoid being too specific (avoid overqualified selectors)
universal selector selects every element, breaks inheritance
hardware acceleration - not suggested

Other tips
avoid CSS greater than 20KB
avoid CSS holds more than 4096 rules
border-radius, box-shadow and RGBA are all slow

Friday, December 2, 2011

Comments in different places

XML
<!-- your comment -->
No -- within comments, so no nested comments (for backward compatibility with SGML)
No ending in ---> (three dashes are invalid)

JSON
comments in JSON are not allowed, if really need, add some property like "_comment" in JSON object. With that, XML is better for configuration, but JSON is better for data exchange in Ajax application.

Protobuf
To add comments to your .proto files, use C/C++-style
// your comment

Thrift
.thrift files support standard shell comment #, also support c/C++ style comment or javadoc style
# your comment
/** 
your documentation (single or multiple lines)
*/
// your comment

Properties
.properties files can use the number sign (#) or the exclamation mark (!) as the first non blank character in a line to denote that all text following it, is a comment.
# your comment
! your comment

Javascript
// your single line comment
/*
your multi-line comments 
*/

HTML

<!-- your comment -->

CSS
/* your comment*/

Java
// your comment
/* your comment */
/** your documentation */

C++
/* your comment */
// your comment

ASP.NET
<%-- your comment --%>

VB.NET
' (apostrophe) your comment

C#
// your comment

Wednesday, November 30, 2011

Understanding about performance optimization

I am working on performance optimization task for a web application for few months, here I am trying to summarize my understanding about performance optimization.
  1. Performance optimization needs culture
    • Coding Honor says: Performance is a feature
    • Fred Wilson's 10 Golden Principles say: speed is more than a feature
    • Design with performance in mind
  2. Performance optimization needs tools and guidelines
    • High Performance Web Site (Steve Sounders)
    • Yahoo Best Practices for Speeding Up Your Web Site
    • Lots of tools from WPO companies and open source communities
  3. Performance optimization needs 80/20 rule
    • Find top issues and fix them
    • Put small things on hold
  4. Performance optimization needs data
    • Measurement (problem) -> Analysis (root cause) -> Optimization (solution) -> Measurement (new problem)
    • Google says: Every millisecond counts
  5. Performance optimization needs passion
    • Do extensive research
    • Repeat the optimization loop (It is a heuristic loop instead of dead loop)
    • Evangelist performance optimization
    • Make faster web and better life

Tuesday, November 15, 2011

IE8 CSS file 4096 rule limit

Recently I asked front-end developers to try to merge small CSS files into one, and then use CSS compressor or optimizer to minify CSS file size. There are many tools there to improve CSS quality (csslint, validators), or minify CSS (YUI compressor, cssmin, and many online tools). In my mind, we should not have more than 3 CSS files per module (one common, one project global, one module specific). This rule not only benefits web performance, but also modularizes CSS cascading coding.

However, one web developer rejects this request and says IE cannot take CSS files with 5000 rules. I don't know this, so test it using IE8, FF8 and Chrome15, and the result reveals that Internet explorer 8 does not take the 4096 rule from one CSS file.

This blog (CSS size limit in Internet Explorer 8) also explains the similar problem and includes more IE related CSS limit. However, please note 4096 is not file size, but rule numbers. About the file size, the blog also mentioned
IE6/IE7: Limit for CSS file size around 285KB (depends on specific stylesheet)


CSS best practices
  1. Write succinct CSS - don't forget CSS stands for Cascading Style Sheets, so avoid redundancy in your CSS. Use shorthands, use cascading codes
  2. Write correct CSS - don't forget validate your CSS using validators, csslint
  3. CSS compressor/optimizer - for performance using tools
    • CSS Drive
    • CSS Compressor
    • CSS Optimizer
    • Clean CSS
    • Pingates
    • PHP Insider
    • SevenForty
    • Arantius
    • Lottery Post
    • Page Column
  4. Gzip CSS - save more than 70% size

strangeloop cyber Monday tips

Recently I got a series of emails with tips from strangeloop regarding web site optimization before cyber Monday. Each tip has the core basic about WPO, and also introduces their Site Optimizer. Here I want to summarize the email series and their optimizer.

In my mind, WPO always starts with measurement (testing) to understand the performance problem, then find tools, solutions to solve the problem, then measure again. This is an iteration process, and focus on top 3 issues in each iteration will make your WPO more efficient. I blogged my understanding of core practices or Fundamentals of WPO 2 month ago. From  strangeloop tips, another key word I find very suitable is "simplify". What we are trying to achieve is to simplify the web site, less http request and less data.

Let's look at the tips from strangeloop.
http://www.strangeloopnetworks.com/cyber-monday-2-weeks/
  1. Test your site, using tools (webpagetest.org)
  2. Make sure your site follows core best practices
    • Text compression: The easiest way to reduce your page sizes 
    • Keep-alives: Control your TCP connections
  3. Identify and fix your sluggish third-party content
    • Audit your 3rd-party content
    • Use the latest version of your widgets
    • load asynchronously wherever possible
Then let's look at how Site Optimizer works
http://www.strangeloopnetworks.com/products/overview/how-it-works/
  1. Simplify the page
  2. Recognize different browsers
  3. Preload relevant page elements
  4. Optimize for repeat visits and flows
  5. Start loading fasterOptimize third-party content

Monday, October 31, 2011

SSL certificate in Java Keystore

Untrusted Certificate?
If you get the ssl certificate from trusted public CA like Verisign, Thawte, digicert, GeoTrust etc, JRE and browsers will recognize it. However for some non-popular CA or home-issued certificate (for in-house testing purpose), JRE will not trust it. For instance, DST Root CA X3 isn't trusted by Java/Android platform even though most browsers trust it.

How do I fix this? 
Import certificate to Java Keystore.
First, save the certificate (*.cer).
Second, use keytool to import the Root certificate into your cacerts keystore.

Import certificate
The cacerts file is located in your JRE install directory under "<JRE_HOME>/lib/security/cacerts". The command to import will be similar to: $ keytool -keystore /opt/jre/lib/security/cacerts -storepass changeit -import -trustcacerts -v -alias DSTRootCAX3 -file dstRootCAX3.cer

Trust this certificate? [no]:  yes
Certificate was added to keystore
[Storing /usr/java/jre/lib/security/cacerts]

After above step done, restart services (Java process).

Verify imported certificate in keystore
C:\Program Files\Java\jdk1.7.0_01\jre>bin\keytool -list -keystore .\lib\security
\cacerts -storepass changeit -v > newstore.out
C:\Program Files\Java\jdk1.7.0_01\jre>notepad newstore.out

Alias name: verisignclass1g2ca
Creation date: Mar 25, 2004
Entry type: trustedCertEntry

Owner: OU=VeriSign Trust Network, OU="(c) 1998 VeriSign, Inc. - For authorized use only", OU=Class 1 Public Primary Certification Authority - G2, O="VeriSign, Inc.", C=US
Issuer: OU=VeriSign Trust Network, OU="(c) 1998 VeriSign, Inc. - For authorized use only", OU=Class 1 Public Primary Certification Authority - G2, O="VeriSign, Inc.", C=US
Serial number: 4cc7eaaa983e71d39310f83d3a899192
Valid from: Sun May 17 17:00:00 PDT 1998 until: Tue Aug 01 16:59:59 PDT 2028
Certificate fingerprints:
     MD5:  DB:23:3D:F9:69:FA:4B:B9:95:80:44:73:5E:7D:41:83
     SHA1: 27:3E:E1:24:57:FD:C4:F9:0C:55:E8:2B:56:16:7F:62:F5:32:E5:47
     SHA256: 34:1D:E9:8B:13:92:AB:F7:F4:AB:90:A9:60:CF:25:D4:BD:6E:C6:5B:9A:51:CE:6E:D0:67:D0:0E:C7:CE:9B:7F
     Signature algorithm name: SHA1withRSA
     Version: 1


*******************************************
*******************************************

Use -rfc to get certificate
C:\Program Files\Java\jdk1.7.0_01\jre>bin\keytool -list -keystore .\lib\security
\cacerts -storepass changeit -rfc

Alias name: verisignclass1g2ca
Creation date: Mar 25, 2004
Entry type: trustedCertEntry

-----BEGIN CERTIFICATE-----
MIIDAjCCAmsCEEzH6qqYPnHTkxD4PTqJkZIwDQYJKoZIhvcNAQEFBQAwgcExCzAJBgNVBAYTAlVT
MRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE8MDoGA1UECxMzQ2xhc3MgMSBQdWJsaWMgUHJpbWFy
eSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEcyMTowOAYDVQQLEzEoYykgMTk5OCBWZXJpU2ln
biwgSW5jLiAtIEZvciBhdXRob3JpemVkIHVzZSBvbmx5MR8wHQYDVQQLExZWZXJpU2lnbiBUcnVz
dCBOZXR3b3JrMB4XDTk4MDUxODAwMDAwMFoXDTI4MDgwMTIzNTk1OVowgcExCzAJBgNVBAYTAlVT
MRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE8MDoGA1UECxMzQ2xhc3MgMSBQdWJsaWMgUHJpbWFy
eSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEcyMTowOAYDVQQLEzEoYykgMTk5OCBWZXJpU2ln
biwgSW5jLiAtIEZvciBhdXRob3JpemVkIHVzZSBvbmx5MR8wHQYDVQQLExZWZXJpU2lnbiBUcnVz
dCBOZXR3b3JrMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCq0Lq+Fi24g9TK0g+8djHKlNgd
k4xWArzZbxpvUjZudVYKVdPfQ4chEWWKfo+9Id5rMj8bhDSVBZ1BNeuS65bdqlk/AVNtmU/t5eIq
WpDBucSmFc/IReumXY6cPvBkJHalzasab7bYe1FhbqZ/h8jit+U03EGI6glAvnOSPWvndQIDAQAB
MA0GCSqGSIb3DQEBBQUAA4GBAKlPww3HZ74sy9mozS11534Vnjty637rXC0Jh9ZrbWB85a7FkCMM
XErQr7Fd88e2CtvgFZMN3QO8x3aKtd1Pw5sTdbgBwObJW2uluIncrKTdcu1OofdPvAbT6shkdHvC
lUGcZXNY8ZCaPGqxmMnEh7zPRW1F4m4iP/68DzFc6PLZ
-----END CERTIFICATE-----


*******************************************
*******************************************

Thursday, October 13, 2011

Wildcard SSL certificate

What is wildcard SSL certificate?
SSL certificates containing the wildcard character "*" in the CN of a server are called wildcard certificates. A "*" wildcard character MAY be used as the left-most name component in the certificate. For example, *.example.com would match a.example.com, foo.example.com, etc. but would not match example.com.

When to use wildcard SSL certificate?

1. Wildcard SSL certificate is good for one top domain but needs multiple sub domains, something like
a.example.com
b.example.com
www.example.com
foo.example.com
Instead of purchasing 4 SSL certificate, you can purchase one *.example.com wildcard SSL certificate.

2. Wildcard is good for many servers using different sub domains.

3. Wildcard doesn't support EV (extended verification), therefore if you need EV, you have to use regular certificate

What is the price?

Wildcard providers have 2 charge models: one is per server, the other is unlimited servers (See below for Pricing and providers, as of Oct 1, 2011, and the list is subject to change without notice, therefore always check providers' official website/sales rep for latest quote and product information)

Digicert.com $475 per year (3 years term, unlimited server)
http://www.digicert.com/ssl-certificate-comparison.htm

Thawte
the Wildcard certificate is $639 and every additional server you need it on would be $447. (3 years term has 15% discount)
[This info was from sales rep when I contacted them]
http://www.thawte.com/ssl/volume-discount-ssl-certificates/index.html

VeriSign - unknown (It is expensive, might be around $800)
http://www.verisign.com/ssl/buy-ssl-certificates/index.html?tid=a_box

GeoTrust Wildcard $446.00
http://www.geocerts.com/ssl/wildcard
http://www.geotrust.com/ssl/wildcard-ssl-certificates/

Godaddy is the cheapest $179.99
http://www.godaddy.com/ssl/ssl-certificates.aspx


One VIP multiple cert?
There seems no good answer for this question, different load balancers might have different behaviors, but F5 seems to support this from below article
http://devcentral.f5.com/Tutorials/TechTips/tabid/63/articleType/ArticleView/articleId/1086451/Multiple-Certs-One-VIP-TLS-Server-Name-Indication-via-iRules.aspx
And digicert seems to support multiple domain names in one wildcard certificate via SubjectAltName
http://www.digicert.com/ssl-support/wildcard-san-names.htm

Wednesday, October 12, 2011

iCalendar 101

This week we found an issue when send iCalendar via Microsoft Exchange server, recipients could not receive the meeting invite. The root cause was we didn't set value for MAILTO and CN parameters in Attendee property in VEVENT component for the core iCalendar object. It may be Exchange server specific requirement, like outlook requires UID and DTSTAMP parameters in *.ics file, because we didn't capture this defect when using simple POP3 email server.

I took this chance to get some general idea about iCalendar from google.com. (See a bunch of reference resources at the bottom). iCalendar is a standard (RFC 5545) for calendar data exchange, it is a file format which allows Internet users to send meeting requests and tasks to other Internet users. Two popular file extensions are *.ics (calendaring and scheduling information) and *.ifb (free or busy time information). iCalendar is designed to be independent of the transport protocol.

iCalendar always begins with BEGIN:VCALENDAR and ends with END:VCALENDAR which defines a core object. Within the iCalendar object, we can define some calendar properties and calendar components (VEVENT, VTODO, VJOURNAL, VFREEBUSY, VTIMEZONE, VALARM). One calendar property can have multiple parameters, and one calendar component can have multiple properties or sub components. This is kind of a tree structure to describe Internet Calendaring and Scheduling Core Object (see RFC5545 for details).

Example to explain ics object model:

BEGIN:VCALENDAR  -- starts iCalendar object
VERSION:2.0 -- calendar property
METHOD:PUBLISH
BEGIN:VTIMEZONE  -- starts VTIMEZONE component
TZID:India Standard Time -- component property
BEGIN:STANDARD -- starts sub component
DTSTART:16010101T000000  TZOFFSETFROM:+0530
TZOFFSETTO:+0530
END:STANDARD -- ends sub component
END:VTIMEZONE -- ends VTIMEZONE component
BEGIN:VEVENT -- starts another component (event)
DTSTART;TZID="India Standard Time":20111019T110000 -- componnet property, event starts time
DTEND;TZID="India Standard Time":20111019T120000 -- componnet property, event ends time
LOCATION;ENCODING=QUOTED-PRINTABLE:Webinar - See conference call information below -- component property, with parameters ENCODING
UID:100000000040827055 -- component property, required by outlook, unique ID for current event
DTSTAMP:20111012T172729Z - component property, required
DESCRIPTION: Click this link to join the Webinar
SUMMARY;ENCODING=QUOTED-PRINTABLE:Moving your data to the Cloud - Part 1
BEGIN:VALARM -- starts alarm component within VEVENT
TRIGGER:-PT15M -- component property, alarm trigger time
ACTION:DISPLAY
DESCRIPTION:Reminder
END:VALARM -- ends alarm sub component
END:VEVENT -- ends VEvent component
END:VCALENDAR -- ends iCalendar object

If we format above ics file with indent, it looks like

BEGIN:VCALENDAR  -- starts iCalendar object
    VERSION:2.0 -- calendar property
    METHOD:PUBLISH
    BEGIN:VTIMEZONE  -- starts VTIMEZONE component
        TZID:India Standard Time -- component property
        BEGIN:STANDARD -- starts sub component
            DTSTART:16010101T000000  TZOFFSETFROM:+0530
            TZOFFSETTO:+0530
        END:STANDARD -- ends sub component
    END:VTIMEZONE -- ends VTIMEZONE component
    BEGIN:VEVENT -- starts another component (event)
        DTSTART;TZID="India Standard Time":20111019T110000 -- componnet property, event starts time
        DTEND;TZID="India Standard Time":20111019T120000 -- componnet property, event ends time
        LOCATION;ENCODING=QUOTED-PRINTABLE:Webinar - See conference call information below -- component property, with parameters ENCODING, note that property suffix with semicolon instead of colon for this case
        UID:100000000040827055 -- component property, required by outlook, unique ID for current event
        DTSTAMP:20111012T172729Z - component property, required
        DESCRIPTION: Click this link to join the Webinar
        SUMMARY;ENCODING=QUOTED-PRINTABLE:Moving your data to the Cloud
        BEGIN:VALARM -- starts alarm component within VEVENT
            TRIGGER:-PT15M -- component property, alarm trigger time
            ACTION:DISPLAY
            DESCRIPTION:Reminder
        END:VALARM -- ends alarm sub component
   END:VEVENT -- ends VEvent component
END:VCALENDAR -- ends iCalendar object

About accept/tentative/decline an invite
In iCalendar, need set RSVP parameter in Attendee property, otherwise user will see "Save and Close".  Also Organizer property is to define the calendar component organizer. When do accept/decline/tentative, organizer will get the response.

http://tools.ietf.org/html/rfc5545#section-3.2.17
   Parameter Name:  RSVP
   Purpose:  To specify whether there is an expectation of a favor of a
      reply from the calendar user specified by the property value.

http://tools.ietf.org/html/rfc5545#section-3.8.4.3
   Property Name:  ORGANIZER
   Purpose:  This property defines the organizer for a calendar
      component.
   Value Type:  CAL-ADDRESS
   Property Parameters:  IANA, non-standard, language, common name,
      directory entry reference, and sent-by property parameters can be
      specified on this property.

Programming iCalendar

Prepare a multipart/alternative mail:
Part 1: text/html - this is displayed to ordinary mail readers (not support iCalendar) or as a fall-back and contains a summary of the event in human readable form

Part 2: text/calendar; method=xxx, holds the contents of the ics file (the header method parameter must match the method in the ics). Default encoding is UTF-8 in iCalendar
Part 3: Optional, attach the .ics file itself, so ordinary mail readers can offer the user something to click on. Outlook does not really require the attachment because it just reads the text/calendar part.

Code snippet using JavaMail
message.addHeaderLine("method=REQUEST");
message.addHeaderLine("charset=UTF-8");
message.addHeaderLine("component=VEVENT");

messageBodyPart.setHeader("Content-Class", "urn:content-classes:calendarmessage");
messageBodyPart.setHeader("Content-ID","calendar_message");
messageBodyPart.setDataHandler(new DataHandler(
new ByteArrayDataSource(buffer.toString(), "text/calendar")));//very important, buffer is ics file data

Use iCal4j
This open source project provides APIs for read and write ics files.


The following applications (calendar or email reader) already support iCalendar
  • Google Calendar
  • Apple iCal
  • Lotus Notes
  • Outlook 2000/2007/2010
  • Windows Live Calendar
  • Yahoo Calendar
  • Mozilla Thunderbird
  • SeaMonkey

References
:
http://en.wikipedia.org/wiki/ICalendar
http://tools.ietf.org/html/rfc5545 (rfc2445 was obsoleted by rfc5545 in 2009)
http://www.kanzaki.com/docs/ical/
http://build.mnode.org/projects/ical4j/project-info.html
http://stackoverflow.com/questions/461889/sending-outlook-meeting-requests-without-outlook

Friday, October 7, 2011

Open sources for big data analytics

Today I attended a webinar called "Big Data Technologies for Social Media Analytics" from Impetus Technologies. They introduced their iLaDaP platform built on top of a bunch of open source libraries. There were some case studies for financial/online retailer data analytic, but not very detailed. My take away from this webinar is - there are many open source projects surrounding Hadoop for big data analysis. Apart from simply adding them into your project, you need understand their pros and cons.

Hadoop
http://hadoop.apache.org/
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

Hadoop MapReduce
http://hadoop.apache.org/mapreduce/
Hadoop MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes.

Hadoop HDFS
http://hadoop.apache.org/hdfs/
Hadoop Distributed File System (HDFS™) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations.

Hive
http://hive.apache.org/
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.

Apache Pig
http://pig.apache.org/
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

Oozie
https://github.com/yahoo/oozie
Oozie - workflow engine for Hadoop

Sqoop
https://github.com/cloudera/sqoop/wiki
Sqoop is a tool designed to import data from relational databases into Hadoop.

Mahout
http://mahout.apache.org/
Scalable machine learning libraries. Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining

Hbase
http://hbase.apache.org/
HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data.

Flume
https://github.com/cloudera/flume
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

Apache Camel
http://camel.apache.org/
Apache Camel is a powerful open source integration framework based on known Enterprise Integration Patterns with powerful Bean Integration.

NLTK: Natural Language Toolkit
http://www.nltk.org/
Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.

Impetus webinar presenter also mentioned two companies in this area.
Intellicus
http://www.intellicus.com/
Intellicus is one of the leading providers of next generation web-based Business Intelligence and Reporting solution,

Greenplum
http://www.greenplum.com/
Greenplum is the pioneer of Enterprise Data Cloud solutions for large-scale data warehousing and analytics.

Monday, October 3, 2011

ABR Streaming

Adaptive is a new keyword in WPO (Web performance optimization) blogs and articles. Adaptive image, adaptive video and adaptive streaming and more. The idea is to know the clients (browser, device, media player) difference (CPU, bandwidth, screen size, resolution, RTT etc) and serve different contents adaptive over HTTP. Instead of RTP (Realtime transport protocol), HTTP is CDN friendly solution because of more operational of Http servers on the edge.

ABR (Adaptive Bit Rate) video stream is to detect user's bandwidth and CPU in real time and adjust the quality of a video streaming accordingly. In 2006 Move Networks created this idea. They built a product which trans-rated videos into multiple versions of the same asset, encoded at different bit-rates. Further their product divided each video in many small chunks or “streamlets” each a few seconds long. They built a player which downloaded a video as a series of HTTP GET requests for sequential streamlets. The player continuously measured the available bandwidth so that the next GET request issued would be for a version of the streamlet best matched to measured available bit rate.

The chunked concept was very successful though Move Networks business was not. Apple, Microsoft and Adobe all implemented this ABR. Netflix (Video streaming) is using ABR too.

Apple HLS (Http Live Streaming), it works by breaking the whole stream into a sequence of small Http-based file downloads. As the stream is played, the client will select from a number of different bit-rate streams based on client CPU and bandwidth. M3U8 playlist is the first request, and it contains the metadata for various sub-streams.

Microsoft HSS (Http Smooth Streaming), it is a IIS media services extension to enable adaptive streaming of media to Silverlight and other clients over Http. HSS uses the simple concept of delivering small content fragments (typically 2 seconds video) and verifying that each has arrived within appropriate time and playback at the expected quality level. Based on the result, do adaptive delivery for next fragment. Manifest file is the first request which describes the fragment metadata to the client.

From above 2 implementations of ABR, we can see ABR solution needs client (player), Http streaming server and transcoder (to break whole content into small chunks in different bit rate) and also a manifest files for ABR metadata. Adaptive content serving or getting in real time provides good performance and user experience. We should be able to use similar idea in other WPO initiatives.

References:
http://en.wikipedia.org/wiki/Adaptive_bitrate_streaming
http://www.contentdeliverynews.com/?page_id=93

Friday, September 30, 2011

Send email in linux

In new walk-through environment, there are some configuration issues need look at application logs to see what is going on. SSH to application server (tomcat) and vi the application log is the common way. Ops team can alsomail the log file to other developers to analyze, so need sendmail. Luckily there is "sendmail" in Linux.

Command Line:

  1. Check if sendmail is running => ps ax | grep sendmail
  2. Check if sendmail is running => /etc/init.d/sendmail status
  3. Check sendmail listening info (IP and port) => netstat -tnlp | grep sendmail
  4. Send email (please use man mail for user guide)
    1. mail -s "Email subject" example@example.com < /tmp/test.txt
    2. cat /tmp/test.txt | mail -s "Email subject" example@example.com
    3. echo "The message in email body" | mail -s "Email subject" example@example.com
    4. (df -h;free -m) | mail -s "Disk and Memory Info" example@example.com
    5. less /proc/cpuinfo | mail -s "CPU Info" example@example.com -c ccto@example.com
Shell script:
#!/bin/sbin
#df -h | mail -s "disk space" example@example.com
#!/bin/bash
df -h > /tmp/mail_report.log
free -m >> /tmp/mail_report.log
mail -s "disk and RAM report" example@example.com < /tmp/mail_report.log
rm -f /tmp/mail_report.log

Friday, September 23, 2011

jQuery 10 performance tips

This is a summary of 10 jQuery performance tips from http://addyosmani.com/jqprovenperformance/
  1. Use the latest jquery release
  2. Know your selectors (ID, element, class, pseudo & attribute)
  3. Use .find() -> $parent.find('.child').show() is the fastest than others (scoped selector)
  4. Don't use jQuery unless it's absolutely necessary -> this.id is fater than $(this).attr('id')
  5. Caching -> storing the result of a selection for later re-use, no repeat selection
  6. Chaining
  7. Event delegation -> if possible, use delegate instead of bind and live
  8. Each DOM insertion is costly -> keep the use of .append(), .insertBefore() and .insertAfter() to a minimum; .data() is better than .text() or .html(); $.data('#elem', key, value) is faster than $('elem').data(key, value)
  9. Avoid loops -> Javascript for or while loop is faster than jQuery .each()
  10. Avoid constructing new jQuery object unless ncessary -> use $.method() rather than $.fn.method(); $.text($text) is faster than $text.text(), $.data() is faster than $().data
Update (11/30/2011):
In recent jQuery performance discussion, I summarized the following points based on different messages.
  1. Scope jquery selectors (use find)
  2. Know selectors performance => ID > element (tag) > class > pseudo & attribute
  3. Avoid loops =>  if loop is inevitable, consider for/while > $.each()  if possible
  4. Cache the result of a selection for later use (using local variables if possible)
  5. DOM operation is expensive (esp. in a loop)
  6. Don't use $ unless it's necessary => this.name > $(this).attr('name’) 

    Thursday, September 22, 2011

    Fundamentals of WPO

    There are two fundamental factors (action items) for Web Performance Optimization (WPO), request number and data size. There are top rules and plenty of tools to assist these two optimization. This is common sense and very straightforward to understand the importance. One metaphor about moving in real life, if you can throw away what you don't necessarily need, and keep your belongs to minimum say 1 U-haul truck can hold, then you can make the move fast because you only need 1 round trip with minimum stuff.


    Reduce Http request - Send it as infrequently as possible
    Http round trip is expensive esp. with a long RTT, or for new http connection. Reducing http request can be achieved using the following:
    • Merge static resources (CSS, JS, Image)
    • Add Cache control (including ajax?)
    • Combine dynamic requests
    • Build Single page application (using Ajax)
    • Avoid redirect
    • Fewer DNS lookup
    Reduce download size - Send as little data as possible
    Small data size saves bandwidth (tcp package numbers). It can be achieved using below ways:
    • Gzip resources
    • Minify JS and CSS (Code optimization, obfuscation, duplication removal)
    • Crush images
    • Add Cache control (caching in browser)
    • HTML5 local storage, local cache
    For more info, refer to 14 Rules for Faster-Loading Web Sites

    Friday, September 9, 2011

    DNS Override

    DNS server keeps website domain name and its IP address mapping. Public DNS servers are supposed to keep in sync (eventually consistence), so every browser (user agent) can do DNS lookup using different DNS servers, and get same IP address for same domain name.
    Domain sharing is a technique to optimize web performance for parallel downloading, but it brings more DNS lookup effort on browser side. The lookup sequence is described below, and browse stops if found corresponding IP address, then loads the website.
    1. Browser cache
    2. Computer hosts file for a DNS entry (OS dns cache?)
    3. Default DNS server (this is usually ISP's or your employer's DNS server)
    4. Other DNS servers
    5. If domain name can not be resolved, browser will display "server not found" error page
    DNS override is to change domain name to IP address mapping in step 2 or step3, so that local computer will point to another IP address for the same domain, but public users still connect to existing IP address for the same domain name.

    Override DNS entry using local DNS server
    In step 3, it is usually done in internal DNS server or local DNS server which is used internally in your corporate, and it will impact all employees. Change ISP's DNS sever is actually changing all public DNS servers, so override should not happen on this DNS server.

    Override DNS entry using hosts file
    It is kind of key-value pair mapping file for domain name and IP address. Here is the list of file path on different operating systems.
    1. Windows: C:\Windows\System32\etc\drivers\hosts
    2. Linux:  /etc/hosts
    3. Mac: /etc/hosts 
    DNS Suffix
    When you ipconfig /all, it will print out details about your network connections. Two more entries about DNS are Primary Dns Suffix (usually it is your computer registered domain) and DNS Suffix Search List (which is a list of your domains suffix when you connect using partial of FQDN).

    For example, if the ipconfig /all output looks like below:

    C:\Documents and Settings\mypc>ipconfig /all

    Windows IP Configuration

            Host Name . . . . . . . . . . . . : mypc23434
            Primary Dns Suffix  . . . . . . . : example.local
            Node Type . . . . . . . . . . . . : Hybrid
            IP Routing Enabled. . . . . . . . : No
            WINS Proxy Enabled. . . . . . . . : No
            DNS Suffix Search List. . . . . . : example.local
                                                corp.example.com
                                                new-example.com

    If you have some FQDN like test.example.local, exam.corp.example.com, motor.new-example.com, then if you ping using short name test, exam or motor, you will get DNS suffix appended to find the FQDN.

    Friday, September 2, 2011

    Javascript and Flash communication

    In previous blog two errors when javascript calls flash methods I listed out 2 major errors when Javascript calls Flash methods. That was one way communication, but actually Flash can also call Javascript functions. Here is a quick summary about the typical two way communications between these 2 web technologies.

    Flash (ActionScript):
    ExternalInterface.call(“JS_FunctionName”, [args]); // calls a JavaScript function from inside your Flash or Actionscript file, passing arguments as additional parameters after the string function name of the JavaScript function.

    ExternalInterface.addCallback(“ExposedName4JS”, AsMethod); // exposes the ActionScript method AsMethod to javascript under the javascript function name ExposedName4JS, which is passed as a string.


    Javascript:
    1. Get the Flash object
    function getFlashObject(name)
    {
      if (window.document[name])
      {
          return window.document[name];
      }
      if (navigator.appName.indexOf("Microsoft Internet")==-1)
      {
        if (document.embeds && document.embeds[name])
          return document.embeds[name];
      }
      else //IE uses object, FF uses embed, getElementById only works for <object> tag
      {
        return document.getElementById(name);
      }
    }

    2. Call flash method
    getFlashObject("name").ExposedName4JS();

    HTML:
    <object id="name" data="mymovie.swf" width="640" height="360" type="application/x-shockwave-flash" >
        <param name="allowfullscreen" value="true">
        <param name="allowscriptaccess" value="always">
        <param name="wmode" value="opaque">
        <param name="flashvars" value="backgroundColor=#ffffff">
        <embed name="name" src="mymovie.swf" swliveconnect="true" quality="high" bgcolor="#FFFFFF" width="640" height="360" type="application/x-shockwave-flash">
        </embed >
    </object >

    Random notes:
    1. SWFObject is an easy-to-use and standards-friendly method to embed Flash content, which utilizes one small JavaScript file http://code.google.com/p/swfobject/
    2. Flash is not fully loaded when the body's onload event fires or domcontentloaded event fires. At least it is true in Chrome.
    3. IE doesn’t destroy instances of the flash correctly when a page is reloaded, suggest to reset the instance like below
    if(navigator.appName.indexOf(‘Microsoft Internet’) != -1)
   window[this._id] = new Object();

    Thursday, September 1, 2011

    Summary about High Performance Mobile Meetup

    This Tuesday (Aug 30, 2011) I attended the SF web performance meetup at LinkedIn (Mountain View). Steve Souders presented a great talk "High Performance Mobile". Apart from this year Velocity conference HttpArchive lighting demo, this is the second time I joined his session. I read twice his previous talk slide "High Performance HTML5" at SF performance meet up, but could not join in person. Of course, I borrowed and read his two famous books.

    The event started at 6:30pm, welcomed attendees with portable plastic water bottle and Mexican food. Steve began his talk around 7pm after some introduction from meetup organizer (Aaron Kulick) and LinkedIn performance lead (they are hiring performance engineers). The talk consisted of 4 parts, you can find details from http://www.slideshare.net/souders/high-performance-mobile-sfsv-web-perf

    Part 1: WPO
    Nothing new, but he reiterated the importance and role of WPO. The Web Is Dead (http://www.wired.com/magazine/2010/08/ff_webrip/all/1), will this be true and what is the fate of WPO? It might be too early to conclude "The web is dead". One thing is true, the web is evolving with HTML5.

    Benefits of WPO
    1. drives traffic
    2. improves UX
    3. increases revenue
    4. reduces costs
    Part 2: Why Mobile
    Data and analysis show that Mobile is very important, but slow. Also the "Road is not clear" for performance optimization

    Part 3: Mobile Best practices
    Most desktop Web performance rules still apply. Steve mainly shared 5 items to which he thinks Web performance engineers need pay more attention.
    1. Reduce Http request (sprite/dataURI/CSS3/Canvas) - this should be the golden rule
    2. Responsive images (sencha.io src/DeviceAtlas/adaptive-images.com) - previously read an article tweeted by Stoyan, have some general idea about this
    3. script async & defer (execute when available, execute when parsing finished. He also mentioned his controlJS) - heard this many times, are all new browsers supporting them? should we include javascript using async & defer
    4. Appcache (5M+ limit) - from HTML5
    5. Local storage (window.localStorage) - from HTML5
    Part 4: Mobile tools
    The impressive one of this part was his demo after he briefed following 4 tools. Steve also mentioned his bookmarklet for mobile performance but he didn't add into his deck.
    1. pacpperf
    2. jdrop
    3. blaze.io
    4. weinre (WEb INspector REmote)
    After that, there was book giveaway and e-book lottery. Somehow I didn't have a number, so left before venue was clear up. (I hope I can get a signed copy of his book)

    Key takeaways
    1. Mobile is important but very slow
    2. There are challenges to make mobile fast
    3. There are tools to assist mobile performance
    4. Mobile winners will be fast
    Thanks Steve/Aaron/Sponsors to make this meetup so successful. Looking forward to next meetup in south bay area.

    Different Testings

    Testing is an art than science. There are different testings for different purposes to ensure the software quality. The goal is same, but the process, strategy and methodology are different from different testings.

    Correctness testing
    Correctness is the minimum requirement of software, the essential purpose of testing. It is for software quality, also called function testing.

    Black-box testing
    test data are derived from the specified functional requirements without regard to the final program structure. It is also termed data-driven, input/output driven, or requirements-based testing.

    White-box testing
    the structure and flow of the software under test are visible to the tester.

    Performance testing
    a process that focuses on testing individual components of the web app, such as databases, algorithms, network infrastructure, and cache layers under certain load

    Load testing
    a process of determining how an application under specific volumes of load, usually a range of the upper and lower limits expected by the business. Endurance testing is also part of this testing type.

    Stress testing
    a process of identifying when and how systems fail (and recover) under extreme levels of load. Also known as negative testing or destructive testing.

    Reliability testing

    a process of finding the probability of failure-free operation of a system.

    Security testing
    identifying and removing software flaws that may potentially lead to security violations, and validating the effectiveness of security measures. Simulated security attacks can be performed to find vulnerabilities.

    I summarized above testings based on below excellent blogs and paper. It is interesting to understand these terms better while working with QA (quality assurance) team in daily work.
    http://agiletesting.blogspot.com/2005/02/performance-vs-load-vs-stress-testing.html
    http://agiletesting.blogspot.com/2005/04/more-on-performance-vs-load-testing.html
    http://blog.browsermob.com/2008/12/performance-vs-load-vs-stress-testing/
    http://www.ece.cmu.edu/~koopman/des_s99/sw_testing/

    Tuesday, August 30, 2011

    Website is not an island

    Today I joined the Webinar from compuware talking about How to Mitigate the performance risk of 3rd party Web components. Apart from their impressive application delivery chain explanation of the necessity to test and monitor Web performance from end user perspective, "No website is an island" is also an interesting topic with regard to the importance of looking at third party web component performance.

    There are quite a few talks about Ad and Google Analytics performance. Current Web sites are getting richer (average page size is keeping increasing). Most contents are not served from single host (domain), instead from different outside sources, like Advertisement, Analytics, NewsFeed, Blog, Social network, APIs, Video, Shopping cart, Search engine, Rating&Review and Cloud objects. The benefits of using 3rd party components outweigh the risks - it is not under your control.

    The best way is to mitigate the risk using below guidelines:
    • Choose 3rd party component with high SLA
    • Decide on a mitigation strategy (fault-tolerance, fallback, design for failure, alternative)
    • Test it under all conditions
    • Figure out a way to monitor it
    During the webinar, Compuware CTO also talked about 5 best practices, and I totally agree the tips for Mobile Web site because it covers 3 major factors (# of requests, download size and network latency) regarding mobile Web performance.
    • Minimize third-party content
    • Limit # of hosts, connections and requests
    • Keep size small and use CDN

    Monday, August 29, 2011

    Web browsers and engines

    When programming Web, we need consider cross browsers to make sure the HTML/CSS/Javascript codes work well in different Web browsers. As of today, the most popular web browsers are Internet Explorer, Firefox, Google Chrome, Safari, and Opera.
    http://en.wikipedia.org/wiki/Web_browser

    Web browser has an engine to take marked up content (such as HTML, XML, image files, etc.) and formatting information (such as CSS, XSL, etc.) and displays the formatted content on the screen. (for data and view) This engine is also called sometimes called layout engine or rendering engine.
    http://en.wikipedia.org/wiki/Web_browser_engine

    1. Internet Explorer 9 uses Trident
    2. Firefox uses Gecko
    3. Chrome uses WebKit
    4. Safari uses WebKit
    5. Opera uses Presto
    http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(HTML)

    Apart from layout engine, Web browser also needs a javascript engine to execute javascript (for actions).  Most new javascript engines have JIT compiler for better performance (See SunSpider for benchmark data)

    1. Chakra: A new IE JScript engine used in Internet Explorer 9. It was first previewed at MIX 10 as part of the Internet Explorer Platform Preview.
    2. SpiderMonkey: A JavaScript engine in Mozilla Gecko applications, including Firefox. The engine has two types of JIT compilers, that are sometimes referred to as JägerMonkey or TraceMonkey.
    3. V8: A JavaScript engine used in Google Chrome.
    4. SquirrelFish: The JavaScript engine for Apple Inc.'s WebKit. Also known as Nitro.(JavascriptCore)
    5. Carakan: A JavaScript engine developed by Opera Software ASA, included in the 10.50 release of the Opera web browser.
    http://en.wikipedia.org/wiki/List_of_ECMAScript_engines

    In my mind, new Web browsers are getting faster, and Web development is more focusing on framework, pattern and productivity. We are moving to HTML5, Javascript (latest version 1.8.5) and CSS3, and expecting to see less difference across these browsers. A new Web era is coming.

    Sunday, August 28, 2011

    Two errors when javascript calls flash methods

    There are two frequent errors I met when I writing javascript and flash communication codes.

    1. Error calling method on NPObject!
    This error usually means Flash side exception, it might be caused by Flash itself error handling logic, or caused by wrong parameters passed to Flash from Javascript.

    2. Uncaught TypeError: Object #<HTMLObjectElement> has no method 'xxx'
    This error usually means flash object is not available or no such a method, it might be caused by flash object is not fully loaded, or not found the correct object, or call the wrong flash method (from Flash ExternalCallInterface).

    Friday, August 26, 2011

    Open source 101

    Open source software is software provided on terms allowing user to use, modify, and distribute the source code. It is different from freeware or shareware. In the open source world, "free" usually means "freedom to modify and redistribute source code" rights that do not necessarily come with freeware or shareware.

    An open-source license is a copyright license for computer software that makes the source code available for everyone to use. Copyleft licenses require that you share any modifications that you make to the original code. Usually, these licenses also require that you share these modifications under the exact same open source software license as the source code. Different open source licenses have different levels of copyleft, namely No Copyleft, Weak Copyleft and Strong Copyleft.

    Here are a list of common licenses:

    • GPL v.3 - GNU General Public License, version 3   
    • LGPL v.3 - GNU Lesser General Public License, version 3 (sometimes referred to as the "Library" General Public License)   
    • GPL v.2 - GNU General Public License, version 2   
    • LGPL v.2.1 - GNU Lesser General Public License, version 2.1 (sometimes referred to as the "Library" General Public License)   
    • MPL - Mozilla Public License, version 1.1   
    • Eclipse - Eclipse Public License, version 1.0   
    • CPL - Common Public License, version 1.0   
    • CDDL - Common Development and Distribution License, version 1.0   
    • MIT - MIT License   
    • BSD - Berkeley Software Distribution License or "New BSD"
    Take away:
    Feel free to use MIT/BSD license which has no copyleft, try to avoid GPL which has strong copyleft. MPL, Eclipse, CPL, CDDL are falling into weak copyleft category.

    Friday, August 5, 2011

    MySQL/InnoDB Replication Options

    MySQL Asynchronous Replication
    Master doesn't wait for slave
    Slaves determines how much to pull log events from Master (Bin log) to Slave (Relay log)
    Read stale data on slave

    No flow control
    Corruption

    Master failure
    Data loss

    MySQL Semi-Sync Replication
    Master waits for an ACK from slave
    Slave logs the transaction event in relay log and ACKs
    Read stale data on slave
    No flow control
    Corruption

    Master failure
    Data loss


    Schooner MySQL Sync Replication

    After commit, all Slaves guaranteed to receive and commit the change
    Slave in lock-step with Master
    Read stale data on slave
    No flow control
    Corruption

    Master failure
    Data loss

    DRBD (Distributed Replicated Block Device)
    Block level replication available for Linux
    Suitable for a Master with Direct Attached Storage (DAS)

    SAN (Storage Area Network)
    Shared disk with similar pros and cons as DRBD
    Available for multiple OS
    Snapshot backups are a plus


    Reference
    http://www.schoonerinfotech.com/blog/

    Thursday, August 4, 2011

    Cache related Http Headers

    Http headers can instruct what kind of cache mechanism browser and proxy should obey along the request/response chain. For static resources (Javascript, CSS, images, flash etc), it is suggested to apply cache on browser or proxy side to reduce # of Http requests.

    Response Headers:
    Cache-Control     Tells all caching mechanisms from server to client whether they may cache this object (public/private to control if browser or proxy cache, no-store to control if save to disk, max-age is to control how long to cache)
    Expires     Gives the date/time after which the response is considered stale (suggested 1 year from now, for aggressive static resource cache)
    Date     The date and time that the message was sent (It is useful for Expires by date time)
    ETag     An identifier for a specific version of a resource, often a message digest (Suggest to disable it for performance, or re-configure ETag to remove server specific info. See If-None-Match. ETag takes precedence over Last-Modified if both exist)
    Last-Modified     The last modified date for the requested object, in RFC 2822 format (for conditional get, 304 Not Modified, see If-Modified-Since)
    Pragma     Implementation-specific headers that may have various effects anywhere along the request-response chain. (Http1.0, example is Pragma: no-cache)
    Vary     Tells downstream proxies how to match future request headers to decide whether the cached response can be used rather than requesting a fresh one from the origin server. (The most common case is to set Vary: Accept-Encoding, so that proxy knows if return cached compressed data to browser)

    Request Headers:
    Cache-Control     Used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain
    If-Modified-Since     Allows a 304 Not Modified to be returned if content is unchanged
    If-None-Match     Allows a 304 Not Modified to be returned if content is unchanged
    Pragma     Implementation-specific headers that may have various effects anywhere along the request-response chain

    Recommendations:
    It is important to specify one of Expires or Cache-Control max-age, and one of Last-Modified or ETag, for all cacheable resources. It is redundant to specify both Expires and Cache-Control: max-age, or to specify both Last-Modified and ETag.

    You use the Cache-control: public header to indicate that a resource can be cached by public web proxies in addition to the browser that issued the request.

    Avoiding caching
    HTTP version 1.1 -> Cache-Control: no-cache
    HTTP version 1.0 -> Setting the Expires  header field value to a time earlier than the response time

    Reference
    http://tools.ietf.org/html/rfc2616
    http://en.wikipedia.org/wiki/List_of_HTTP_header_fields
    http://code.google.com/speed/page-speed/docs/caching.html#LeverageBrowserCaching
    http://code.google.com/p/doctype/wiki/ArticleHttpCaching

    Tuesday, August 2, 2011

    Mobile Web Optimization Webinar Takeaway

    Last week I attended a webinar from Compuware talking about mobile web optimization using page speed. The talk was very clear and well organized. It first went through the importance of mobile web performance, then discussed key difference between mobile and desktop, then detailed page speed rules about mobile web.

    Browser is the entry point to mobile web
    The browser is becoming the integration platform
    The browser is becoming more complex
        - # of hosts per user transaction
        - Many RIA frameworks
        - Performance differences across devices

    Free performance tools
    Page Speed
    WebPagetest
    dynaTrace Ajax Edition

    Mobile web page load process
    Mobile channel establishment
    DNS lookup
    TCP connect
    Http request
    Parse & Layout (subrequests)

    Key differences between mobile and desktop
    Networks:
        round-trip time (High channel establishment time, lower RTT)
        bandwidth (3G vs Cable)
    Devices:
        CPU (JS execution times, layout times, 10x JS runtime cost, 1 ms per kb parsing)
        memory (more code/objects - more GC, more DOM, more memory)
    Interaction model
        (touch vs click, mobile click event with 300-500ms delay)

    Page Speed Rules
    Use an application cache (how about localstorage?)
    Defer JS parsing (how about deferring JS download?)
    make landing page redirects cacheable (Cache-control: private, max-age > 0) (How long? 301 and 302 are both cacheable?)
    prefer touch events (why not disable click event on mobile?)

    Reference:
    http://slidesha.re/qgC8n3
    http://www.slideshare.net/Gomez_Inc/optimizing-web-and-mobile-site-performance-using-page-speed

    90% line in JMeter aggregate report

    I considered 90% line is the response data 90% users will see, but recently realized this was wrong. JMeter online Help has detailed explanation about every item in Aggregate report.

    The 90% line tells you that 90% of the samples fell at or below that number. However, it is more meaningful than average in terms of SLA. We expect it within 2x of average time. That is, if average time is 500ms, we expect 90% line is less than 1000ms. Otherwise the system fluctuates a lot.