Google improved Flash crawling… but why?

Author: seven July 2, 2008

Google has been developing a new algorithm for indexing textual content in Flash files. They teamed up with Adobe and improved the performance of Flash indexing algorithm. Stop cheering, calm down and continue reading. :)

This was a great news for Flash beginners but real problem lies a bit deeper inside complex Flash architecture which can’t be that easily indexed. No, this won’t magically bring SEO to your Flash website. No, this won’t lift your pagerank to a new heights. The algorithm it self has few rather nasty caveats which actually can, in my opinion, bring more confusion than good search results.

Even with new improvements, Google spider:

  • is not able to index text in additional flash content you may be loading (trough bootstrap loader with XML or something) and even if, by some miracle it does get indexed it won’t be connected anyhow to other crawled text content in Flash files you may be using on your website.
  • can’t really cope with different crazy javascript flash embedding techniques (eg swfobject and such)
  • doesn’t understand graphics and FLV’s

Knowing that and being hardcore Flash fanatic as I am, I contemplated a bit around Flash crawling concept. I can only speculate how Google crawls Flash these days, but I remember tool from couple of years ago called swf2html.exe shipped with Flash Search Engine SDK.I suppose search engines used that, for retrieving text from Flash files. The tool was not updated since Macromedia wrote it in 2002 but you can check it here.

Indexing Flash content by extracting strings of text from Flash files completely invalidates the purpose of Flash as a complex and feature rich presentation technology. Many parts of website are basically graphics with static text, or are dynamically loaded or generated. If you ignore that graphical-textual content (and you have to since crawler doesn’t have OCR), and index just a text content, the search engine user will get confusing, or even very dangerous results. Partially indexed misleading information is more dangerous than no information at all. Flash RIA applications in 99% cases use dynamically populated lists so I am not sure that Google will be able to crawl those sites either.

This is like having a tour guide, to guide you trough city in which he has never been before, and in country which language he doesn’t speak fluent. Thank you but, I think I’ll pass.

Correct my if I am wrong, but solution is pretty simple and it works great – If you are building Flash website, build an XHTML fallback version of the website. Google knows and loves valid markup structure and there are still many corporate users which can’t install (or upgrade) Flash because their system administrator is a noob. By having XHTML version you can even better control the content you want to get indexed and non Flash users will thank you too.

CEO/CTO at Nivas®
Neven Jacmenović has been passionately involved with computers since late 80s, the age of Atari and Commodore Amiga. As one of internet industry pioneers in Croatia, since 90s, he has been involved in making of many award winning, innovative and successful online projects. He is an experienced full stack web developer, analyst and system engineer. In his spare time, Neven is transforming retro-futuristic passion into various golang, Adobe Flash and JavaScript/WebGL projects.

    One thought on “Google improved Flash crawling… but why?”

    Leave a Reply

    Your email address will not be published. Required fields are marked *