The Internet - Intro Notes - Back to Index



I tried searching for "Palestine" using my Google Custom Search Engine and it only brought up a single page. I knew the word existed on other pages and indeed, when I did a normal Google search for it (restricting results to pages from my domain) several more came up. I had the same problem before but I thought I had fixed it by reducing the number of sites to three in my CSE definition.


Nick Saber isn't happy now. Monday afternoon, after lunch, Nick came back from lunch to find out that he couldn't get into his Gmail account. Further, he couldn't get into anything that Google made (beside search) where his account credentials once worked.


Until a couple of months ago I was not using any sort of calendar application and just kept a list of upcoming events in a text file. I had once tried using iCal on my iBook but it was a bit fiddly, particularly using a trackpad for navigation. Since then I have switched to an iMac and iCal has improved a lot, so at the beginning of May I transferred events from the text file and started using iCal. That was OK, but a few days ago someone mentioned Google Calendar and I decided to investigate.

What I decided to do was keep a public calendar on Google Calendar and just use iCal for personal stuff that I would never want to share. I am aware that you can have private events on Google Calendar, or even whole private calendars, but I still decided to keep my private stuff on my own machine.

So the first step was to create a new calender on iCal called "public" and migrate all my public entries into there. I then exported that calendar a .ics file. Next I went into Google Calendar (easy since I already had a Google account) and imported the .ics file, which seemed to work fine. I called the calendar "zenatode diary" and made it public by choosing to share it. I also subscribed to a couple of public calendars (moon phases and UK holidays) and the calendar of the RampART social centre in London. All pretty simple.

Next I went back into iCal, deleted my "public" calendar, and then subscribed to my new "zenatode diary" on Google Calendar. That way I can use iCal to view both my remaining private events and my public ones on Google. I chose to have the subscription auto-refresh once per day. I don't think there is an easy way to maintain the zenatode diary in iCal and have it sync up to Google so I will be maintaining it through the Google interface.

There was one perplexing thing which is to do with URLs. When you add an event in iCal you can add a URL and I assumed that these would transfer over to Google when I imported the .ics file. Well looking at the calendar in Google there was no evidence of these URLs so I have started to add them by manually adding the appropriate URLs to the event descriptions in Google. The strange thing is that when I view the events in iCal now I can still see the URLs, so I guess they must be there in Google, just not accessible through the Google Calendar interface. I tried a web search to see if there was a way to access them but no luck so far. Still, it is early days yet.


The Google Social Graph API looks like it could become very useful.

Search Engine Watch has been around a long time and is still a useful site.


I have posted a lot of stuff about Google on my website but until now I have not had a dedicated page - this is it. I use Google a lot, and not just for search. I have a Google account which allows me to view my "Web History" (at least for periods when I search while logged in, which is most of the time). I also have a Blogger account which is basically a Google facility. I am subscribed to a few Google Groups (but not Usenet groups - for that I use slrn and News.Individual.NET) and I use their Webmaster Tools. I have also created a Google Custom Search Engine and a map using their Maps API (for which I had to sign up for a Maps API key). One notable thing I don't use is Google Mail.

Google is coming up with new stuff all the time and they obviously have a huge influence on the Internet (according to Market Share Google currently have 77.11% of the search market, with Yahoo! coming in a distant second at 12.23% and all the others hardly registering). A lot of people are concerned about their dominance, despite (or perhaps because of) their informal corporate motto "Don't be evil"! Although I don't have an personal privacy concerns (which is why I am happy to let them keep my Web History) I do understand the general privacy concerns. Some people won't use Google for search at all, and either use alternate search engines or the anonymised Google scraper called Scroogle.

OK, now for the reason I created this page. I am not happy with how long it takes Google to update its index of pages on this site. As I mentioned, I use their Webmaster Tools and I went to the trouble of building in automated sitemap generation to the simple CMS I wrote to maintain this site. According to Google:

The Sitemap Protocol allows you to inform search engines about URLs on your websites that are available for crawling. In its simplest form, a Sitemap that uses the Sitemap Protocol is an XML file that lists URLs for a site. The protocol was written to be highly scalable so it can accommodate sites of any size. It also enables webmasters to include additional information about each URL (when it was last updated; how often it changes; how important it is in relation to other URLs in the site) so that search engines can more intelligently crawl the site.

It sounds good in theory but although they frequently download my sitemap it takes ages for them to pick up changes to my pages and incorporate them in their index. That might be partly due to the fact that my pages are served as application/xhtml+xml rather than the normal text/html but they get indexed eventually so I feel that can't be the sole reason. This wouldn't concern me so much if it were the same for everyone but some people's sites seems to get indexed almost real time and I feel discriminated against!

Well just recently I read something which makes me wonder whether I wasted my time creating a sitemap and that I would have been better off just creating an RSS feed. The article is called Google Taking Blog Comments Searching Real-Time, from which:

The other night, I started doing some investigating and found something that seems amazing to me. Google seems to now be full-text indexing not only RSS Feeds but the entire contents of all of the pages listed the feed at a refresh rate of less than 2 hours, not just for big RSS feeds like Slashdot, but for many small ones as well.

Two hours? It might take two months to refresh my pages! I would like to investigate further but I have too many other things to be getting on with. By the way, I have my own little trick for monitoring Google's coverage of my website. My CMS adds a footer to all my pages which includes a random looking string of 16 lower case characters which is the same on every page. Occasionally I switch to a new tag string and do a site rebuild which updates the "lastmod" date for all URLs in my sitemap. Immediately after doing that I will see no search results for the new tag string but as Google picks up the updated pages the number of search results increases. At least it should increase. I last changed the tag string on 2007-12-08 and the number of results gradually increased to about 150 (out of about 200 pages). However, it then started decreasing down to about 50 results as of a few days ago. It still knew about my pages but it actually seemed to be reverting to older versions? Searching today it was suddenly back up to 119 results - very strange! Ian Gregory 2010