Problems Compressing Javascript on the Fly

Load speed is an important issue when developing a large scale system, not only from a Bandwidth usage perspective but it also impacts your users load speed and the load on the server. For our latest project Agent Storm we developed the system with knowing that we where going to combine and minify the CSS and Javascript files on the fly and cache them to disk for subsequent fetches by users. Everything was working great during testing but as soon as we switched off Debug mode we noticed that certain parts of site that relied on JavaScript where dead.

Firing up Firebug and reloading the pages spawned a bunch of errors but the JavaScript was compressed and minified so it was hard to read and make sense of what the problem was, switching back to debug mode made the site work flawlessly and not throw any errors. So where do we go from here?

Minifying JavaScript required post-processing the javascript files and making changes to it based on what we know about how JavaScript is coded. If there is an error, such as a missing ; or { then this can cause broken minified JavaScript to be generated, given that we know the JavaScript works in its original context this is probably whats happening. But, without the tedious job of looking through all those lines of code how can we find these errors?

I remembered an extremely useful site I stumbled upon a while back called JsLint by Douglas Crockford, which is a JavaScript code quality tool. It essentially checks your JavaScript files for error and reports any potential problems or no-no’s. Its an absolutley fantastic time saving tool and in this case it worked!

Pasting in the JavaScript files one by one and fixing the errors reported, we where able to find a few missing ; hidden away in the original code. Once all the reported errors where fixed and re-enabling Minifying and Compression on the backend showed that the entire system now functioned as expected and no errors where fired in Firebug.

Not only is checking the code quality of your JavaScript files a geeky thing to do, it also helps your customers as your code is more likely to be cross browser compliant. A little time spent now to validate your JavaScript could save a lot of time later on if a customer reports a problem.

Using Google AppEngine Task Queues for Blog Pinger

We don’t just release tools for the sake of it, our tools are targeted to test out functionality of something we are looking into or wanting to test. So, when we where deciding on a use case for AppEngine Task Queues we wanted something which would not only benefit from using them, but also try to use them multiple times. We decided on writing Blog Pinger which sort of satisfied these requirements.

Blog Pinger is a simple use of Queues, it receives a URL either from the web site or via XMLRPC and adds that URL to a queue which handles grabbing the required information for sending out pings. The URL can either be a HTML page or RSS feed depending on the supplied content type depends on how we handle it, if its a HTML page we look at the meta tags for a link to the RSS feed and re submit the new URL for crawling, if the page is a RSS feed we parse the file for the blog title, blog url and rss feed url and then submit that to another queue for sending out the pings.

AppEngine Task Queues work on the basis that if an error/exception occurred and its un-handled and the response code is not a ‘200 OK’ then the queue is resubmitted back to the task queue to be handled again at a later date. When sending out a large number of repeat information you have to be very careful that the downstream web site does not think your spamming, also badly configured upstream clients and spammers can constantly hit your site which may cause an error and you could be accidently, repeatedly sending out duplicate pings to your upstream sites. (We didn’t actually do this, honest Guv!).

So in short, if doing anything which publicly triggers a task queue which then causes a hit on another web site make sure your tracking the referring ip and rate limit requests from that ip, and from a AppEngine Task Queue perspective wrap all your statements in a try/catch block, usie the logging class to log the exception and return a status of 200 OK from your workers no matter what.

Whats new at pyHub this week (22-Feb-10)

Well its been a busy week, working on both client projects and some of our own but the end of the week is in sight and its going to be as busy as the rest of the week. So whats new at pyHub this week?

Blog Pinger

Blog Pinger simply allows you to easily send out pings to inform various web sites that your web site or blog has been updated. We created this tool because there are so many blog pingers out there which are slow, manual or poorly maintained and its really not rocket science (it took less than 5 hours from idea conception to making it live and is less than 100 lines of actual code).

You can use Ping Blogger directly from the website by entering either your main blog address or the link to the RSS feed. It all happens very quickly and all the magic happens in the background using Google AppEngine task queues to queue and send the pings out to the individual sites.

As of today it sends out pings to over 30 websites and we will add more as time goes on.

Tweet Keys

We made some changes to tweetkeys to enable it to process the Twitter sample feed and build keywords based off it. The sheer amount of data its receiving means its taking a little time for the actual data to populate and show anything of any relevance. We are confident that over time this will fix itself, its looking at keyword trends over time so as time goes on it will get more relevant, or at least thats the theory. As of writing this post tweetkeys is tracking <> 900,000 keywords.

We are going to migrate Tweet Keys on to a larger machine this weekend where we will be able to increase the memory and cpu available to MySQL, we think this is why the python twitter stream reader is bottling out as there are more tweet processing threads than actual available mysql threads, either that or we may separate the mysql writes into its own self contained thread with its own queue. A little (live) testing will tell us which is best. Ahhhh nothing like live testing :)

CrawlerSpy

We found a bug in Crawler Spy which was preventing the activation emails being sent out to people who signed up for free accounts. Whoops! We fixed it and its all rolling nicely again. Crawler Spy currently has (at time of writing, and strangely) 3,333 spiders in its database. Look out Fantomas, we are catching up ;) DONT FORGET: The CrawlerSpy IP Database is completely free while we build our database up!

Check Backlinks

Thanks to a really nice guy called Michael Green he let us know about a bug in CheckBacklinks.net that meant occasionally showing a nasty error message. Given that Michael took the time to tell us about it, we took the time to fix it :) We added some stricter error checking and Michael reports its seems to be working again. Thanks Michael!

pyChargify

Sometime at the end of January a user of pyChargify (@mrtron) sent us a pull request on github for some changes they had made to the pyChargify library which fixed a few bugs. This week we finally got around to merging those changes with the HEAD branch. Thanks @mrtron for your help!

//

On that note it’s back to setting up our Apple APNS infrastructure.

Why “Release early, Release often” is sometimes a bad idea

Theres nothing like taking a product and releasing it to the world and try and build a client base based on a product in beta. We recently found out that this is not always a good idea and could actually risk your great business idea before it even hits the ground.

Imagine..

You have had a great idea and nothing like it exists, you do the basic development put up the web site, find your target market and do the SEO. Problem is you have done it so well that your now generating 30 leads a day and you now have developers and clients all telling you they need your service and need it now.

Your a web 2.0 start-up, your product is still a little wet behind your ears, and all your time is spent dealing with customers instead of making your product bullet proof and free of bugs and expanding your product so that more people will be interested etc.

You will probably start looking for someone to help you out, but the problem is the business isn’t making enough money so you either have to find a hand-on partner who you will have to bring up to speed on your product and service or find some investment either from Venture Capital or a Loan.

In order to find an investor you will need a business plan and this requires taking time out from everything else to sit down and concentrate on your business plan and figuring out exactly where you want your business to go. If you choose the path of taking on a partner, you have to rake into account the who, the what and the why and also that your productivity will be cut by half while they get up to speed on everything thats going on. And on top of that what happens if the person decides its too much stress and pulls out.

So what can you do to try and avoid the above problems?

  1. Get in touch with Venture Capitalists from day 1, give them a brief overview of your business and keep them up to date with milestones and your accomplishments.
  2. Don’t release too early before doing a full release make sure your have all the basics covered. Your basic service, the billing system etc. If your willing to take a little risk with taking on increased support then consider a private beta. Private betas can work two ways, by creating a buzz about your Product/Service and allowing a controlled amount of intial users of your product/service.
  3. Have a business plan even if its only 2-3 pages most importantly have a financial forecast for the ext 12 months, this is the most important piece of information any investor will ask for.
© 2010 pyHub Limited. All Rights Reserved. Registered in England and Wales No. 7158104
Registered Office: 788-790 Finchley Road, London. NW11 7TJ.