Thursday, 16 September 2010

Whatblock 2.0 not working in Internet Explorer?

Can anyone else confirm the following for me?

In internet explorer when I have no external internet content filtering in place whatsoever, whatblock2.0 is never able to load a selection of the favicon.ico files from a few websites in-page.

The following favicon.ico files don't load:
  • http://rewiredstate.org/favicon.ico
  • http://icanhascheezburger.com/favicon.ico
  • http://sourceforge.net/favicon.ico
  • http://flickr.com/favicon.ico
  • http://ted.com/favicon.ico
  • http://miniclip.com/favicon.ico
  • http://wordpress.com/favicon.ico
It is ALWAYS these 7 favicon.ico files which fail to load in Internet Explorer, however when I open the favicon.ico files individually in Internet Explorer (by just typing http://.../favicon.ico into the address bar), they load - so it's not as-if the files don't exist or IE9 can't cope with the icons themselves. WhatBlock2.0 works in Google Chrome, Safari, Firefox perfectly well. Can anyone tell me what's going on here?

To test, go to http://dev.dfey.org/whatblock2 - accept the terms, put in an institution name and then click to test your internet content filter. If the same happens to you (or doesn't happen to you) I would really appreciate knowing - don't forget to mention which version of Internet Explorer you're using!

I'm really hoping we can get this sorted as Internet Explorer is the main browser we need WhatBlock2.0 to be working in!

Thanks

Tuesday, 10 August 2010

WhatBlock 2.0

So here I'm going to set out my proposals for WhatBlock 2.0, please comment on how feasible/realistic my ideas are!
http://dev.dfey.org/whatblock -- for the current app
http://dev.dfey.org/whatblock/view.php -- for the current data collected by the app

What is WhatBlock?
WhatBLOCK is an http-accessible application designed to test the internal networks of schools and offices across the country to assess the wide-scale use and abuse of internet content filtering.

Many students and office works across the country are often at the mercy of their I.T. departments and content filtering software when it comes to internet access. It can often be a difficult and time consuming process to get often legitimate work-related websites unblocked on an internal network and WhatBLOCK is an attempt to try and provide a level of scrutiny for this process.

It is hoped that WhatBLOCK can ultimately make the lists of banned websites accessible and the people who create them accountable. WhatBLOCK therefore needs to find out which websites are blocked on different networks - this is achieved by presenting the user within the local network with a page containing an iframe of a website which could potentially be blocked. The user is then prompted to declare the name of the School/Institution which they are in and whether or not they can see the page within the iframe; this data is then logged.

The Problems With This
One of the main problems with Whatblock is its reliance on user interaction to collect the data and the corresponding impact on reliability. We are forced to effectively crowd-source the data because of the restrictions on what data we can pick up due to Cross-Site Scripting (XSS - see http://en.wikipedia.org/wiki/Cross-site_scripting).

The Problem with XSS
XSS means we can't make an XML HTTP connection to check if we can get to the external server or not. XSS also means that if we use an iframe to load the webpage, we can't automatically check the URL to which the iframe actually finishes loading. As a result of this we are forced to present the user with an iframe containing a webpage and then ask them whether or not they can see this site. Don't forget that in a lot of these places, people can't run executable binary files - so it's only possible to run this test through a browser in most places.

The Problem with iFrames
This approach means that mistakes are made. Sometimes the webpage may take a while to load and so the user may mistakenly take this is a sign that the webpage cannot load; this also has the effect of increasing the time taken to run each test and so mean that we can run fewer tests with lower user satisfaction (it's their time they're giving up!). The user could also lie giving false data or they could simply make a mistake - both of which add up to less reliable data and a headache when trying to process any of it. A lot of websites such as twitter, almost all webmail services, myspace, etc. also have JavaScript code to break-out of the iframe meaning that with the current testing methods we can't test these websites as the user's browser just ends up pointing to twitter or Gmail. There's no quick or reliable fix for this problem, so we just currently ignore these websites when testing - but these are often some of the more interesting websites to check! So what's the solution to all of this?

Possible Solutions
Idea: We could load images from the websites and ask the user if then can see them.
Problem: Many websites use a devoted image domain, for example twitter uses http://*.twimg.com to serve up ALL images for the twitter.com site - it's more likely that any content filtering server will block twitter.com than *.twimg.com and so it's possible that even if twitter.com is blocked that we might still be able to visit *.twimg.com giving us a false-positive for the test. This also doesn't remove the user interaction.

Idea: We could load a javascript file from the website and check it for any variables we known to be there.
Problem: This is time consuming for the developer as it has to be done individually for each website on the list! Some websites may not use JavaScript, they may not have it stored in .js files meaning it won't be accessible, or they may use a separate domain to keep the files on, as with the images. BUT this does remove the user interaction issue.

A Solution Which Could Work
Now this is where I think the project gets suddenly a lot less feasible... The only way to decrease load time, remove user interaction and reliably get data on this is to put a small JavaScript file on the remote server.

If the remote server admin were kind enough to create a http://www.remoteserver.com/whatblock.js file and put inside it something to the effect of:

whatblock = true

then it would be possible to include the remote javascript file in WhatBlock's runtime page and verify whether or not it was possible for the user to access www.remoteserver.com - as the contents of the file are so small, it should have very little effect on the server's bandwidth usage and is easy to setup. The admin could then inform WhatBlock where the file is, WhatBlock adds it to its database and includes it in future content filtering tests.

Why would an admin choose to do this?
And admin could choose to do this primarily for ideological reasons - the admin may support the act of monitoring secretly held lists of blocked websites. Many of us have felt the pain of going to a work-related website only to find that it's blocked, only to wish that there was some real transparent scrutiny of what gets blocked and what doesn't get blocked. The admin may also find it beneficial to see which institutions can and cannot have access to their websites - how can a website reach out to its target audience if it is unknowingly and incorrectly being made unavailable to the people it is designed to help? It's perfectly acceptable that the people writing these lists make mistakes about what's listed on them but if no one else is allowed to check that list then how is the mistake ever going to be corrected? WhatBlock is planned so that the data collected will be accessible by anyone - not just a select few, to provide a truly open dataset of banned websites - if your website is on WhatBlock, you will be able to see if it's blocked or not.

What if WhatBlock gets Blocked?
This may well happen, so It would be advantageous to have multiple instances of WhatBlock running on different domains - releasing the project under an Open Source license should provide for this.

Where do we go from here?
Well, first things first we would need to build a fully functioning site! Once that vital step is over, public awareness needs to be drawn to this problem to both bring users to the site and to get website administrators to put our little bit of JavaScript on their servers. Then, it's all down to data collection!

Feel free to leave any comments, critiques or suggestions!
Rob

EDIT: A fresh idea was floated by @harryrickards - he suggested that we run a flash instance which we could use as a proxy between the JS in our runtime page and the remote server. Unfortunately that will not work as Flash also has cross domain restrictions. You need to put an XML file on the remote server to allow that to work. (See http://www.adobe.com/devnet/articles/crossdomain_policy_file_spec.html)

The only way I can see this idea working is to get a signed Java applet running on the local user's computer. This is far from ideal but is closest to the executable binary that we can possibly get. (See http://weblogs.java.net/blog/2008/05/28/java-doodle-crossdomainxml-support) - that will then allow us to do cross-domain data transfers, but prompt the user with an ugly confirmation box and make us unable to check the content filters of those computers which don't have Java installed on them. Does anyone know what proportion of business/school computers DON'T have Java installed on them?

EDIT 2: WhatBlock 2.0 prototype has been released for testing using the image technique described above, loading the 'favicon.ico' file from the remote server. I'm currently unsure how this will work with real-world content filtration services, so testing is necessary. If you want to test it, see http://dev.dfey.org/whatblock2.

Sunday, 5 July 2009

DFEY @2morro09

I went with the rest of the DFEY clan (about 10 of us in total) down to London for the "2morrow festival" for young people aged 16-24. Check their website 2morro.org and have a look at their entirely vague explaination of what the event was supposed to be; next, imagine yourself being at an event which was not only just as vague as the description entails but even more boring than you'd think and full to the brim of nothing but people who seem to be over 20 (and in many cases over 30!) who are involved in what seem to be Government-supported initiatives; next, factor in the overwhelming use of Apple branded technology (apart from the LTSP cluster running in the Plings room of course!) and the heavy presence of Channel 4 cameras and invasive surveys and you have the "2morro festival".

The room which was supposed to be showing off cool technologies from Apple, Facebook and the Guardian was infact as excuse for them to crack out about 10 Macbook Airs and a couple of chunkier macs and invite people to get creative making apps with nothing more than the default OS X toolset; that's right - Python, Ruby (with none of the bindings you'd want!), HTML and Javascript! What a fantastic set of tools! Not much got done there, I can tell you that.

I went to what I assumed would be a bit of a debate on education - "YOUR EDUCATION MANIFESTO FOR EDUCATION, Have your say about the education system with ESSA and even get heard by Channel 4". Riiiiight, what actually happened was a bit different; we had a bit of a skit of a scenario in which a student wanted to do something outside the curriculum but the teacher didn't have time; next we talked about the best moments of our education and then we left, an hour after we had started. Would've been fantastic if I had been looking for a drama workshop but not what I expected at all. Would've been nice too if I wan't the only person <19. After that boring hour long session, I found stemount and had lunch with him. Brocolli salad? Come on guys, think who you just invited! Then we decided it was too boring to stay there for much longer and so sodded off to a park just next to Parliament by the river side for a couple of hours.

After being joined by tdobson, we returned and 10 minutes later I caught the train home; and I was glad to be gone. I heard some people won iPod nanos for 2 minutes of talking about an idea they had.... okay.

Was it worth it? Well, I got free train tickets so it was nice to have a little gander around central London yesterday; but if you're thinking of paying to go to it next year, I'd say think twice! Make sure it'll be reasonable next year guys!

Sunday, 26 April 2009

Okay, so the trains in the UK aren't so crap...

Yesterday I went to a DFEY meeting in Manchester and only two people turned up, myself and the organiser Tim Dobson.... what a bust; following an afternoon spent in the park chatting with Tim and Ian Forrester I've decided to start my blog up again. I took the train there and back and had rather comfortable journeys, sure the trains were made in Sweden and the track it was running on had probably been put together by convicts being payed the minimum wage but it was a good journey and from my experience superior to many of the trains in Europe! The trains weren't so stupidly long that I had trouble finding my carriage before the train itself departed, I didn't have to worry that the train was about to rip apart after hearing the compression and worrying expansion of the carriage's plastic interior whenever the train accelerated and I most certainly did not run out of battery power on my laptop! In the European rail network there seems to be a trend of spending all the money on the busy routes that businessmen travel on and then leaving the rest of the networrk with 30 year old rollingstock; sure it brings about cheaper train fares but if you're travelling between any two cities in the UK you don't feel like cattle, you feel comfortable and like a passenger. The seats in the Pendelino are surprisingly more comfrotable than those of the German ICEs and although we don't have a restaurant car on many trains, the quality and variety of food on the British rail network still seems to outstrip anything on the ICEs or, God forbid, the Swiss Inter-Regio trains.

My point: be happy with the British railway system; it may not be the fastest or cheapest in the world but it isn't bad!