<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Ed Anuff</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/" />
    <link rel="self" type="application/atom+xml" href="http://www.anuff.com/atom.xml" />
    <id>tag:www.anuff.com,2008-08-08://1</id>
    <updated>2012-02-05T22:31:34Z</updated>
    <subtitle>an occasional thought or opinion I don&apos;t keep to myself</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.25rc3-en</generator>

<entry>
    <title>Discoverability and the Dynamic Web</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2012/02/discoverability-and-the-dynamic-web.html" />
    <id>tag:www.anuff.com,2012://1.38</id>

    <published>2012-02-05T22:25:02Z</published>
    <updated>2012-02-05T22:31:34Z</updated>

    <summary><![CDATA[It's kind of amusing to see this back and forth about the open web and people suddenly being concerned about whether challenges to Google's position of dominance are good for the "open web".&nbsp; I do think it's very possible that...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[It's kind of amusing to see this back and forth about the open web and people suddenly being concerned about whether challenges to Google's position of dominance are good for the "open web".&nbsp; I do think it's very possible that Google is entering a phase similar to that of Microsoft circa the late 90's but, if so, we won't really know for sure for another 10 years, and a lot can change.&nbsp; However, I do think we're seeing the decline of the search-driven page-oriented web as it's existing for the last decade.&nbsp; The reason is pretty simple and I'll try to summarize it as best as I can, although I'm a long-winded kind of guy, so bear with me.<br /><br /> ]]>
        <![CDATA[When dynamic websites first started to become the preferred method of publishing (via everything from high-end CMS to blogging systems), the challenge was how to make sure that your content was search-engine optimized.&nbsp; This had an unfortunate side effect of killing one of the original reasons people started building dynamic websites, which was personalization.&nbsp; If you wanted Google to find your pages, you needed to show the Google crawler pretty much the same pages as everyone who visits your page would see.&nbsp; There are a lot of ways around this, but since the simplest approach is usually the best approach on the web, the result is that people just didn't invest as much in dynamic personalization as they otherwise would.&nbsp; The second casualty of this was client-side interactivity, and Javascript on web pages was relegated to small enhancements, because if you do everything dynamically in Javascript, Google is again not going to see it.<br /><br />Now, if the enemies of the search-driven web are personalization (user-specific dynamic content) and client-side interactivity (highly programmatic user-driven content), then in the age of social and mobile, we can see that things are going to change in some significant ways.&nbsp; First of all, search engines are only really good at retrieving content that they have in their own indexes.&nbsp; The metacrawlers and federated search systems were only ever really good for specialized searches.&nbsp; For things like Google+ to work the way the user wants and to include Facebook and Twitter, Goggle would need to effectively copy the entire content set and the social graph and access control policies of any social content it wishes to use in it's results.&nbsp; I'm sure some sort of escrow agreement could be negotiated between all the parties involved to allow this, and I'm sure such efforts have been made, but I could see it not being very simple, and it's certainly not scalable and not very "open web".&nbsp; Still, it wouldn't surprise me if Google came up with some sort of standards or service for letting social apps upload their data to Google, so that Google could get the next Facebook or Twitter to already be feeding their social content into Google.&nbsp; If you think no one sane developer would trust Google with this, I'd counter that plenty of developers who compete with Google already trust Google with even more sensitive information such as their email.<br /><br />More of a challenge is that applications have leapfrogged in interactivity to now becoming full client applications, most specifically in mobile.&nbsp; The content that the apps are accessing just isn't being designed for web page use, it isn't being rendered in a way that a search engine can use it in any useful way.&nbsp; Apps are powered by web services and databases full of small snippets of content tied together by relationships and queries that are changing based on any number of dynamic inputs, such as your location.&nbsp; The semantics are opaque to anyone but the programmer who builds them and the designer who crafts the experience for the user.&nbsp; Android has, unsurprisingly, a nice search mechanism for apps to be searchable, but this moves most of the search experience to the client, so I'm not sure that it benefits Google to the same degrees that web search does.<br /><br />Small but relevant digression - I get a lot of people responding with "HTML5 is going to replace native apps, blah, blah, blah" when I talk about this.&nbsp; This is mostly from non-programmers.&nbsp; I've observed that the only technologists who really think HTML5 for mobile means anything to the "open web" are programmers in their 30's, who can't break out of a web page mindset.&nbsp; Programmers who got started in the late 80's and early 90's building client-side apps have a very different perspective that they share with the new programmers in their 20's, who are now doing client-side apps but for mobile rather than the desktop (but ironically using the exact same tools and languages we did in the 90's).&nbsp; There's nothing "open web" (i.e. semantically rich) about an HTML5 app that contains a single script tag and then 20K lines of Javascript.&nbsp; There's nothing there for a search engine to crawl.&nbsp; So, HTML5 and Javascript will very likely replace native code because they're easier, but if you think that it's a return to the SEO and link-driven world, and that app stores are going away, I'm not sure you're necessarily envisioning HTML5 the way that the guys actually coding apps are going to use it.<br /><br />But, back to the point at hand, the issue (as always) comes down to vicious and virtuous circles.&nbsp; Social network optimization appears to deliver better ROI than search engine optimization.&nbsp; Facebook is optimized for social dynamic content and for apps and it generates proven results (traffic) for those.&nbsp; Twitter, to some extent, does as well.&nbsp; So, the effort that you could have put into mirroring all your dynamic content into static search SEO landing pages will be instead put into pushing status updates to Twitter and Facebook and using Twitter and Facebook's APIs.&nbsp; For the user, Facebook increasingly becomes where you learn about apps and personally relevant content, rather than Google.&nbsp; For developers and publishers, the result is that Facebook delivers even more to you and Google delivers even less.&nbsp; Since Google and Facebook are essentially discovery marketplaces, you end up with both consumers and suppliers moving from one marketplace to the other.<br /><br />Now, this really has nothing inherently to do with "open" versus "closed," but it has everything to do with whether we can continue on a model where one or two companies deep mirror the entire web into a database and use it as the mechanism for discovery and whether we expect that the trend towards deeply personalized and social, highly-interactive and primarily client-side executed applications continues. The jury is still out on the former, but I'd certainly bet on the latter.]]>
    </content>
</entry>

<entry>
    <title>Usergrid is now part of Apigee</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2012/01/usergrid-is-now-part-of-apigee.html" />
    <id>tag:www.anuff.com,2012://1.37</id>

    <published>2012-01-18T15:08:06Z</published>
    <updated>2012-01-18T15:24:12Z</updated>

    <summary><![CDATA[I'm happy to announce that Apigee is acquiring Usergrid, the startup I've worked on for the last 18 months since leaving Six Apart.&nbsp; I'm very excited to be joining Apigee and taking Usergrid to the next level with their help.&nbsp;...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[<span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="UG_Apigee.png" src="http://www.anuff.com/images/UG_Apigee.png" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" height="119" width="460" /></span><br />I'm happy to announce that Apigee is acquiring Usergrid, the startup I've worked on for the last 18 months since leaving Six Apart.&nbsp; I'm very excited to be joining Apigee and taking Usergrid to the 
next level with their help.&nbsp; This has been a very cool project based on 
amazing technology for a market that's still in it's earliest stages of 
growth.&nbsp; Thanks to all my friends who helped me get it this far.&nbsp; <br /><div><br /></div>Here are a few links:<br /><br /><a href="http://blog.apigee.com/detail/usergrid_is_now_part_of_apigee">Apigee Blog - Usergrid is now part of Apigee</a><br /> <div><br /><a href="http://techcrunch.com/2012/01/18/api-management-service-apigee-acquires-mobile-data-platform-usergrid/">TechCrunch - API Management Service Apigee Acquires Mobile Data Platform Usergrid</a><br /></div><div><br /><a href="http://blog.programmableweb.com/2012/01/18/more-mobile-apis-coming-with-usergrid-acquisition/">ProgrammableWeb - More Mobile APIs Coming With Usergrid Acquisition</a><br /></div><div><br /></div>]]>
        
    </content>
</entry>

<entry>
    <title>Announcing Usergrid Open Source Mobile Stack</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2011/10/announcing-usergrid-open-source-mobile-stack.html" />
    <id>tag:www.anuff.com,2011://1.36</id>

    <published>2011-10-03T18:22:40Z</published>
    <updated>2011-10-03T18:44:09Z</updated>

    <summary><![CDATA[I've been working on this for time now and we've finally released the source code.&nbsp; Usergrid is a comprehensive platform stack for mobile and rich client applications. The entire codebase is now available on GitHub at https://github.com/usergrid/stack.&nbsp; There's a full...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[I've been working on this for time now and we've finally released the source code.&nbsp; Usergrid is a comprehensive platform stack for mobile and rich client applications. The entire codebase is now available on GitHub at <a href="https://github.com/usergrid/stack">https://github.com/usergrid/stack</a>.&nbsp; There's a full blog post up on the <a href="http://blog.usergrid.com/">Usergrid Blog</a>.<br /><br />One thing that's pretty interesting is that although this was initially envisioned as purely a cloud platform-as-a-service, a lot of the users I talked to were very interested in self-hosting.&nbsp; So, we decided to adopt the WordPress model by making the source completely open and then following that with a cloud version.&nbsp; One difference is that, unlike WordPress which kept the source for WordPress.com and WordPress.org separate, we're releasing the entire multi-tenant architecture.&nbsp; This means that if this takes off, that anyone can run their own private grid.&nbsp; This is the sort of thing we used to debate at Six Apart, but we have the luxury of starting with a clean slate here.&nbsp; It will be interesting to play the role of WordPress this time against the folks that are trying to offer similar functionality purely as closed source hosted-only options.<br /><br />Key to making this possible was the development of a <a href="https://usergrid.s3.amazonaws.com/usergrid-launcher-0.0.1-SNAPSHOT.jar">double-clickable app</a> that fires up the complete stack, including an embedded Cassandra installation, right on your desktop.&nbsp; This means that anyone can get started right away playing with Usergrid.&nbsp; This is kind of an old-school approach, but then again, everything related to mobile is fundamentally about rethinking and in many ways turning back the clock on the relation between the client and the server.&nbsp; For us, though, it means we don't have to raise money to start getting developer traction.&nbsp; And frankly, that's a good thing.<br /><br />Here's a presentation that explains what Usergrid is all about:<br /><br /><br />
<div style="width:425px; margin: auto" id="__ss_9476483"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/usergrid/usergrid-overview-9476483" title="Usergrid Overview" target="_blank">Usergrid Overview</a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/9476483" marginwidth="0" marginheight="0" frameborder="0" height="355" scrolling="no" width="425"></iframe> <div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/" target="_blank">presentations</a> from <a href="http://www.slideshare.net/usergrid" target="_blank">usergrid</a> </div> </div>

]]>
        
    </content>
</entry>

<entry>
    <title>Cassandra Summit SF 2011 Presentation</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2011/07/cassandra-summit-sf-2011-presentation.html" />
    <id>tag:www.anuff.com,2011://1.35</id>

    <published>2011-07-11T19:29:46Z</published>
    <updated>2011-07-11T19:33:51Z</updated>

    <summary> Indexing in Cassandra View more presentations from Ed Anuff...</summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[<div style="width:425px; margin-left: auto; margin-right: auto;" id="__ss_8566623"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/edanuff/indexing-in-cassandra" title="Indexing in Cassandra" target="_blank">Indexing in Cassandra</a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/8566623" marginwidth="0" marginheight="0" frameborder="0" height="355" scrolling="no" width="425"></iframe> <div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/" target="_blank">presentations</a> from <a href="http://www.slideshare.net/edanuff" target="_blank">Ed Anuff</a> </div> </div>]]>
        
    </content>
</entry>

<entry>
    <title>Speaking at Cassandra Summit 2011</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2011/06/speaking-at-cassandra-summit-2011.html" />
    <id>tag:www.anuff.com,2011://1.34</id>

    <published>2011-06-07T00:38:38Z</published>
    <updated>2011-06-16T00:34:39Z</updated>

    <summary><![CDATA[I'm going to be speaking on indexing in Cassandra at the upcoming Cassandra Summit 2011.&nbsp; It'll cover some of the material from my previous blog posts on the subject with some new examples, and should be interesting.&nbsp; I've been a...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[I'm going to be speaking on indexing in Cassandra at the upcoming <a href="http://www.datastax.com/events/cassandrasf2011">Cassandra Summit 2011</a>.&nbsp; It'll cover some of the material from my <a href="http://www.anuff.com/2011/02/indexing-in-cassandra.html">previous blog posts</a> on the subject with some new examples, and should be interesting.&nbsp; I've been a big fan of Cassandra but it provides a much lower level data model than most people are used to with conventional databases.&nbsp; It compensates for this by being much more scalable than any of the other NoSQL databases.&nbsp; However, it pushes a lot of the more advanced data modelling up to the application layer, in particular building your own relationship models and the queries against those.&nbsp; Hopefully I can shed some light on how to do that.<br /><br /><a href="http://www.datastax.com/2011/06/ed-anuff-to-speak-at-cassandra-sf-2011">Ed Anuff to speak at Cassandra SF 2011</a><br /><div><br /></div>]]>
        
    </content>
</entry>

<entry>
    <title>How To Get Traction With Products For Developers</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2011/06/how-to-get-traction-with-products-for-developers.html" />
    <id>tag:www.anuff.com,2011://1.33</id>

    <published>2011-06-02T18:40:50Z</published>
    <updated>2011-06-03T23:21:19Z</updated>

    <summary><![CDATA[I often talk to people who are grappling with the question of how to get their products, particularly cloud services, adopted by developers.&nbsp; If you ask people to name companies that are really good at getting developers to use their...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[I often talk to people who are grappling with the question of how to get
 their products, particularly cloud services, adopted by developers.&nbsp; If
 you ask people to name companies that are really good at getting 
developers to use their products, you typically hear companies like 
Facebook, Google, or Apple listed.&nbsp; These companies, as successful as 
they are, don't really provide a lot of useful hints on how to do this, 
because, in truth, it's not that they're good at getting developers, 
it's that they're "not not good" at it.&nbsp; What I mean by that is that the 
reason people are interested in learning the API of Facebook or figuring
 out how to develop for iPhone has everything to do with the market that
 can be tapped into by creating products for that market.&nbsp; So, people 
would have made the effort if those were the most difficult platforms to
 learn, and. in fact, at the beginning, they were none too easy although
 the situation has changed considerably over time. &nbsp;<br />
<br />
<a href="http://www.demo.com/alumni/demo2006fall/79986.html">When we started Widgetbox back at the end of 2005, the idea was that 
we'd provide a platform for developers to build and deliver widgets and 
that we'd also create a destination site, essentially an "app store", for users to find widgets</a>.&nbsp; We
 didn't initially succeed as well as we'd hoped in being a consumer destination, but we 
did manage to get an impressive number of widgets built on the platform,
 literally thousands of them in the first six months.&nbsp; We didn't have 
any market clout to make this happen.&nbsp; Although we'd hoped to have 
partnerships with social networks to aid in distribution of widgets built 
using our service, these didn't really kick in until much later.<br />
<br />
So, what accounted for the early developer traction at Widgetbox?&nbsp; ]]>
        <![CDATA[There were really three key things that moved the needle, all surprisingly started with product functionality:<br /><br /><ol><li>Maximally Minimal Sign-up</li><li>"Magic" Point Features</li><li>Learning their API, not making them learn ours</li></ol><br /><b>Getting Self-Service Right</b><br /><br />This is such a "Mom and Apple Pie" thing on the Internet that it seems silly to even bring it up.&nbsp; However, developer or professional-oriented products often have implicit or explicit business terms associated with them that are beyond the usual no-commitments consumer terms of service.&nbsp; It probably starts with the goal to maximize the value of the "lead gen" at the expense of making it too easy.&nbsp; In fact, some schools of thought actually want to introduce friction here to make the lead more "qualified" - just don't forget the maxim "all attempts to limit the size of the market for your product will ultimately succeed".&nbsp; There's also the desire to impress the customer with a range of account options, show off that you have corporate accounts and pricing available, and so on.&nbsp; And once you get through all that, you want to awe them with your massively powerful dashboard or command console.&nbsp; Yes, the developer is essentially a "business customer", but business customers are just like any other user on the net, you put too much process and pomp and circumstance into getting started and you've lost them.&nbsp; We instrumented this whole thing as goals in Google Analytics and we could see it first hand.<br /><br />So, even though it should be obvious, it seems like it always bears repeating, and most companies only get it half right.&nbsp; For example, Apple is both bad and good about this with their Developer Program.&nbsp; Bad in that they're really a pain in doing things like setting up a business account, you have to be prepared to fax documents proving your company status, and getting your account set up can be a week long process.&nbsp; Good in that they do have a lot of things in their developer portal that you can ignore until you need to, and they'll remind you when you do.&nbsp; But, overall, it's a safe assumption that if you were anyone but Apple, you wouldn't succeed with this approach unless you made it a whole lot easier for your developers.&nbsp; GitHub is a much more pleasing experience.&nbsp; It's extremely easy to get started and only starts to get a little more complicated when you try to figure out whether you should go to a business account or not.<br /><br />With Widgetbox, at the beginning, we made it as easy as pasting in the URL to a Flash or Javascript widget you'd already built.&nbsp; Over time, this was expanded to letting you paste in an RSS feed, then Flickr, Twitter, and anything else that could kick off the process.&nbsp; There are a few interesting things to this that I'll expand on in my third point about "learning their API" later on.<br /><br /><b>"Magic" Point Features</b><br /><br />Sometimes this gets confused with Minimum Viable Product, and although it's a similar concept, there are a few differences.&nbsp; The idea here is that there's going to be a small set of "quick win" standalone or "point" features that a developer will make use of because they're quick to integrate in relation to the value they provide and which serve as a hook to getting more usage from the developer over time.<br /><br />At Widgetbox, we had three things that immediately added value with a couple of minutes of work on the part of a widget developer.&nbsp; These were:<br /><br /><ol><li>One-click Installation</li><li>Personalization</li><li>Analytics</li></ol><br />Installation refers to the fact that we made it very easy for you to make your widget one-click installable into any website, blog, or social network.&nbsp; Even if you got no other value from us, this was probably of enough use that you'd at least sign up for our free tier.<br /><br />Personalization referred to the fact that building a widget, in Javascript or Flash, represented one set of challenges, but what was really hard for any but the largest widget publishers, was enabling the user of the widget, meaning the blogger or social network user who installed in on their page, to personalize it before or after installing it.&nbsp; What made this hard was that it meant the widget developer suddenly needed to build and manage a big data infrastructure for delivering the personalization data to the installed widgets at runtime.&nbsp; Keep in mind that a successful widget could get millions of views a day, since it might be installed on a number of pages, and each of those pages would get a number of visitors, resulting in a very large number of web service requests from the widget server.&nbsp; This meant that many widget developers were either foregoing or severely limiting the personalization options they provided to make it more manageable for themselves.&nbsp; Widgetbox essentially provided a high-scale key-value datastore coupled to a UI form-builder, so that the widget developer could provide their user with a slick way to configure their widget and then the developer could leave the high-scale serving of the configuration data to the widgets to us.<br /><br />Analytics was the final piece of the equation, and isn't too hard to see the value of.&nbsp; Since we were managing the installs and the per-user configuration information, we were able to provide very useful analytics of all key metrics for widget use such as where it was installed, how many pages on a site were carrying it, how many people were seeing it, how many people who saw the widget decided to install it themselves on their own site (virality).<br /><br />Any of these three features would be reason enough to make use of Widgetbox, and we knew that the key to success wasn't in getting developers to make use of every single feature of the platform on day one, it was in getting a toehold that we could then build on over time.&nbsp; So, with each of these features, the challenge was to really focus on making sure the features were found, utilized, and benefited from immediately. <br /><br /><b>Learning Their API, Not Making Them Learn Ours</b><br /><br />This last point was really the most interesting from a technical standpoint, and was something that we tried to make as invisible to the developer as possible.&nbsp; The goal was that the system was just smoother and easier, but it was critical in making all of the point features we listed above actually work.&nbsp; What "learning their API" means is that we spent a lot of time investigating how existing widgets actually were installed, configured, and measured, and we designed our service to couple seamlessly to these, to the degree that a lot of developers who made use of Widgetbox might not have fully appreciated what we had done (for the hardcore programmers, think of it as "<a href="http://en.wikipedia.org/wiki/Dependency_injection">dependency injection</a> for web widgets", for the rest of us, just think "magic").<br /><br />For installation of widgets, this mean creating our own wrappers that could generate the necessary Javascript or Embed codes to load a Javascript or Flash widget into an existing page in as seemless a way as possible, including automating some of the tricks necessary to get these widgets into MySpace, which at the time had a habit of banning third-party widgets.&nbsp; We'd later expand this to couple directly to the install API's of TypePad and Facebook.<br /><br />For personalization and configuration, it got even more interesting.&nbsp; We found that widgets were typically configured via either parameters in script tag URLs, flashvars in Flash embed tags, or in local Javascript variables.&nbsp; We designed our configuration system to make it easy to substitute in values from a configuration form into these at runtime, so the developer didn't even have to recode their widget to our API, they could just take their existing widget, that perhaps had hard-code configuration options or required the user to hand-edit a URL to configure, and then, using our system, make their widgets dynamically configurable by the end-user.&nbsp;&nbsp; We later expanded this to make the configuration happen contextually based on the content of the page. so that any widget could suddenly be like Google's AdSense widgets that display ads related to the page content, so that someone could use Widgetbox to build an Amazon book widget that had similar behavior.<br /><br />Finally, for analytics, while we would encourage the widget developer to more deeply instrument the widget to get better data, our efforts into understanding the usage lifecycle of widgets, from installation to usage to viral distribution, allowed us to automatically know what events in usage to pay attention to, log, and analyze to make the analytics reporting as immediately valuable to the widget developer without them having to make any modifications to their widgets to take advantage of our system.<br /><br />So, it wasn't just knowing what the high-value point features were, it was learning enough about how these needs were already being addressed that we could be easily substituted for the existing makeshift solutions and immediately start proving our value.<br /><br />Now, of course, it's not to say that we didn't make use of buying AdWords to make people aware of our service and that we didn't have PR and marketing efforts going on all the time.&nbsp; We also spent a lot of time in evangelism and outreach, figuring out how to get to widget developers both one by one and en masse. However, these could only go so far as to deliver the developer to our doorstep.&nbsp; From that point on, we had to figure out how to get the widget developer to start realizing value from the platform as quickly as possible.<br /><br />Hope this is of help to people who are thinking about bringing a developer-oriented product to market.&nbsp; There are a lot of folks who get very intimidated by this type of business and tend to shy away from it, but, as I'm fond of saying, the largest software companies in the world today got their start in products that almost exclusively served developers.<br /><br />]]>
    </content>
</entry>

<entry>
    <title>Some thoughts on the new wave of PaaS startups</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2011/05/some-thoughts-on-the-new-wave-of-paas-startups.html" />
    <id>tag:www.anuff.com,2011://1.32</id>

    <published>2011-05-24T22:41:35Z</published>
    <updated>2011-05-25T16:20:55Z</updated>

    <summary><![CDATA[There's been a lot of activity in the PaaS space lately.&nbsp; This is largely fueled by the successful exit of Heroku, and since a lot of investment tends to be made by looking in the rear-view mirror, a lot of...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[There's been a lot of activity in the PaaS space lately.&nbsp; This is 
largely fueled by the successful exit of Heroku, and since a lot of 
investment tends to be made by looking in the rear-view mirror, a lot of people 
are turning their eyes to infrastructure and platforms.&nbsp; These sorts of 
businesses haven't been as much in favor lately, and there's more than a
 few startup "experts" who have been very critical of 
platforms and infrastructure.&nbsp; While it's true that the business models 
for these have taken some time to adapt to the era of open source and the cloud, it's 
silly to ignore just how much long term value and ROI has come from 
these types of companies.&nbsp; Oracle now owns MySQL, there's a lot of ways to look at that, but I tend to view it as both companies won  and won big (as did both platforms and closed source and open source, this stuff is nowhere near as mutually exclusive as the pundits would have you believe). ]]>
        <![CDATA[Now we're seeing a number of other companies trying to follow the Heroku model.&nbsp; Unfortunately, these ignore some of the underpinnings of what made companies like Heroku and Engine Yard feasible and then what made them good businesses.&nbsp; Most PaaS vendors are basically giving you a place to run your code.&nbsp; Despite conventional wisdom, there are truly just are a handful of languages that are really easy to run as parts of web stacks.&nbsp; Java and .NET have very robust web application packaging and deployment capabilities.&nbsp; The P in LAMP was always PHP, not Python or Perl.&nbsp; Java, C#, and PHP are the three most popular web application languages in the world.&nbsp; If you think otherwise, you work at a Web 2.0 company.&nbsp; Hosting an app written in one of these languages in the last decade hasn't been all that hard, and it's gotten progressively easier.&nbsp; Deploying, running, and debugging a Java web application can be done without leaving your IDE to literally thousands of hosting providers.&nbsp; Amazon with Elastic Beanstalk has added some nice features on top of that to not just deploy but automatically scale Java apps.&nbsp; PHP already has a Heroku, it's called GoDaddy.<br /><br />Back to Heroku, what a lot of people didn't expect was that Ruby on Rails would become as popular as it has, and what they didn't fully appreciate was that Rails had a lot of capabilities that really became a lot more powerful when used in an integrated stack.&nbsp; So, basically, there weren't a lot of good options for Rails hosting, and further, people who went with Heroku or Engine Yard were able to see a very significant increase in productivity because their Rails environments were fully provisioned with a set of stack services that could be readily leveraged.<br /><br />Now, if you're trying to do a PaaS in the "place to run your code" model, you're likely going to miss the point, and you're going to think that adding more languages is going to make the difference.&nbsp; The problem is that you're moving down the long tail of languages, and any business that starts going down the long tail tends to have more than a few problems.&nbsp; I'll stick to the product and technology issues in this post because they're big enough challenges in and of themselves.<br /><br />Let's get into the weeds on this.&nbsp; If I want to build a Java app, there are a bunch of services all defined by standard API's for everything you can imagine.&nbsp; What I want from a Java hosting service is a place I can upload my Java WAR file and all the jars for the apps I want to run.&nbsp; Not always that easy, but nowadays, it's one-click to Amazon Elastic Beanstalk and, because Java is the most popular web programming language in the world, there are a lot of other hosting companies that will handle it too.&nbsp; There are a lot of high end companies that will run it enterprise-grade and low-end companies that just do the basics, but it's a robust ecosystem.<br /><br />If I build a Ruby app, I want the hosting service that knows Ruby and has super-optimized for it.&nbsp; I really don't care if they support PHP too.&nbsp; I don't want them to be spreading themselves thin on WSGI or Plack, or whatever else.&nbsp; I just want to be sure I can be the most productive on Ruby that I can possibly be, that the additional services that are provided in the stack are tested and optimized for use with Ruby, etc.&nbsp; That's raison d'etre for Heroku and Engine Yard.<br /><br />Similarly, if I'm writing Python, it better be awesome at Python.&nbsp; Hence the love/hate relationship Python people have with App Engine.&nbsp; It's not the best Python hosting in the world, but there still aren't a lot of options, and someone may yet do it right.&nbsp; Competing with Google probably deters the competition, but consider that App Engine now supports Java too. Most Java people who value their time or sanity seek other options, of which, unlike is the case for Python, there are many. <br /><br />If you're in the multi-language PaaS business, and you want to be world-class, which you have to be if you want to compete against either the low-margin low-touch hosting companies or the really great high-touch customer-service intensive ones, you better be budgeting $250K a year per language you list on your website, or you should just pack it in.&nbsp; Same thing for every database, every search engine, every message queue.&nbsp; Everything you purport to make "turnkey" or "dead simple".&nbsp; Back at Six Apart, people would complain that we didn't support some specific database with Movable Type.&nbsp; The reason we didn't was that we just couldn't allocate the resources to even pass marginal QA acceptance, let alone claim we had world-class support for that database.&nbsp; Really learning this stuff deeply takes time and effort, like a dedicated fulltime engineer.&nbsp; That's why most tech startups have the "expert guy", who's the end-to-end guru of whatever language or technology they're using.&nbsp; Like the person who wrote the ORM driver for Perl, or created Memcache, or invented OAuth, or contributed some major piece of code to Django, or figured out indexing in Cassandra, or wrote a book on Java.&nbsp; Customers choose PaaS so that they don't have to have the "expert guy" at their company, because the PaaS vendor has them at theirs.&nbsp; Now you might begin to understand why these PaaS vendors have to raise so much money.&nbsp; If you're a PaaS vendor, and you don't want to, or can't afford to, have more than one "expert guy" at your PaaS company, you have no choice but to just focus on one language stack, and the three or four most widely-used languages already have a lot of good world-class options for supporting apps in the respective web stacks for those languages, which means you've got a bit of a challenge in figuring out where to focus your efforts. <br /><br />Just in case you're wondering, yes, I am working on a PaaS startup.&nbsp; But I'm not getting into the "place to run your code" business.<br />]]>
    </content>
</entry>

<entry>
    <title>java.util.UUID.compareTo() considered harmful</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2011/04/javautiluuidcompareto-considered-harmful.html" />
    <id>tag:www.anuff.com,2011://1.31</id>

    <published>2011-04-02T19:28:24Z</published>
    <updated>2011-04-03T00:49:54Z</updated>

    <summary><![CDATA[Java's UUID class compares UUID's using signed comparisons, in a way that will provide opposite results than you might expect and incompatible with other languages.&nbsp; If you're writing an application that compares and sorts UUIDs, you should use an alternate...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[Java's UUID class <a href="http://download.oracle.com/javase/6/docs/api/java/util/UUID.html#compareTo%28java.util.UUID%29">compares UUID's</a> using signed comparisons, in a way that will provide opposite results than you might expect and incompatible with other languages.&nbsp; If you're writing an application that compares and sorts UUIDs, you should use an alternate UUID library for the comparison function or roll your own. ]]>
        <![CDATA[<style>
.example {
    border: 1px solid black;
    font-family: monospace;
    padding: 8px;
    font-size: 8pt;
}
</style>
This came up in looking at how UUIDs are compared by Cassandra's <a href="http://wiki.apache.org/cassandra/UUID">LexicalUUIDType</a> versus it's <a href="http://wiki.apache.org/cassandra/UUID">TimeUUIDType</a> to see if they could be combined.&nbsp; This led to an examination of RFC 4122, the IETF's standard for UUIDs, which formalized conventions and specifications from previous standards.&nbsp; According to <a href="http://www.ietf.org/rfc/rfc4122.txt">RFC 4122</a>, the component values of a UUID are compared as unsigned hex values.&nbsp; During testing, it became clear that Java's UUID class seemed to differ in it's comparison results for certain randomly generated UUIDs.&nbsp; <a href="http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/Collections-Jar-Zip-Logging-regex/java/util/UUID.java.htm#compareToUUID">Looking at the source</a>, it was immediately clear that the UUID class was performing a signed comparison of the UUID as two long values.&nbsp; Does this really matter?&nbsp; For many applications, it probably doesn't, since most people are not relying on the sorting of UUIDs, and the sorting operation is usually solely performed inside a database, for example, where as long as the sort order in that database is consistent, it doesn't matter if it differs from how a client accessing that database might sort the UUID values in whatever language it's written in.<br /><br />Here are some simple examples that will demonstrate how Python and Perl will compare UUID's versus how Java does it:<br /><br />Python:<br /><br />
<div class="example">
&gt;&gt;&gt; from uuid import UUID<br />&gt;&gt;&gt; uuid1 = UUID('20000000-0000-4000-8000-000000000000')<br />&gt;&gt;&gt; uuid1<br />UUID('20000000-0000-4000-8000-000000000000')<br />&gt;&gt;&gt; uuid2 = UUID('E0000000-0000-4000-8000-000000000000')<br />&gt;&gt;&gt; uuid2<br />UUID('e0000000-0000-4000-8000-000000000000')<br />&gt;&gt;&gt; uuids = [uuid2,uuid1]<br />&gt;&gt;&gt; uuids<br />[UUID('e0000000-0000-4000-8000-000000000000'), UUID('20000000-0000-4000-8000-000000000000')]<br />&gt;&gt;&gt; uuids.sort()<br />&gt;&gt;&gt; uuids<br />[UUID('20000000-0000-4000-8000-000000000000'), UUID('e0000000-0000-4000-8000-000000000000')]<br />&gt;&gt;&gt;<br /> 
</div>
<br />Perl:<br /><br />
<div class="example">
#!perl<br />
use Data::UUID;<br /><br />$ug&nbsp;&nbsp;&nbsp; = new Data::UUID;<br />$uuid1 = $ug-&gt;from_string("20000000-0000-4000-8000-000000000000");<br />$uuid2 = $ug-&gt;from_string("E0000000-0000-4000-8000-000000000000");<br /><br />print $ug-&gt;to_string( $uuid1 ) , "\n";<br />print $ug-&gt;to_string( $uuid2 ) , "\n";<br /><br />$res&nbsp;&nbsp; = $ug-&gt;compare($uuid1, $uuid2);<br />print "$res\n";<br /></div>
<br />This example outputs:<br /><br />
<div class="example">
$ perl ./uuidtest.pl <br />20000000-0000-4000-8000-000000000000<br />E0000000-0000-4000-8000-000000000000<br />-1<br />
</div>
<br />Java:<br /><br />
<div class="example">
package test;<br /><br />import java.util.UUID;<br /><br />public class UUIDTest {<br /><br />&nbsp;&nbsp;&nbsp; public static void main(String[] args) {<br />&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; UUID uuid1 = UUID.fromString("20000000-0000-4000-8000-000000000000");<br />&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; UUID uuid2 = UUID.fromString("E0000000-0000-4000-8000-000000000000");<br />&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; System.out.println(uuid1);<br />&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; System.out.println(uuid2);<br />&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; System.out.println(uuid1.compareTo(uuid2));<br />&nbsp;&nbsp;&nbsp; }<br /><br />}<br />
</div>
<br />This example outputs:<br /><br />
<div class="example">
$ java test.UUIDTest<br />20000000-0000-4000-8000-000000000000<br />e0000000-0000-4000-8000-000000000000<br />1<br />
</div>
<br />In the Perl and Java examples, a comparison value of '1' means uuid1 is greater than uuid2, and '-1' means uuid1 is less than uuid2.<br /><br />These examples use simple version 4 (random) UUIDs where the most significant byte values are chosen to be values that will compare differently signed and unsigned.<br /><br />This is now <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7025832">Bug ID </a><font face=""><a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7025832">7025832</a> in the Java Bug Database, but is marked Will Not Fix, because this behavior has been present for over a decade and it would break countless applications to change. &nbsp;</font>
]]>
    </content>
</entry>

<entry>
    <title>Indexing in Cassandra</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2011/02/indexing-in-cassandra.html" />
    <id>tag:www.anuff.com,2011://1.30</id>

    <published>2011-02-25T23:10:19Z</published>
    <updated>2011-02-26T19:08:19Z</updated>

    <summary><![CDATA[I'm writing this up because there's always quite a bit of discussion on both the Cassandra and Hector mailing lists about indexes and the best ways to use them.&nbsp; I'd written a previous post about Secondary indexes in Cassandra last...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[I'm writing this up because there's always quite a bit of discussion on both the Cassandra and Hector mailing lists about indexes and the best ways to use them.&nbsp; I'd written a previous post about <a href="http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html">Secondary indexes in Cassandra</a> last July, but there are a few more options and considerations today.&nbsp; I'm going to do a quick run through of the different approaches for doing indexes in Cassandra so that you can more easily navigate these and determine what's the best approach for your application.<br /><br /><b>The Primary Index</b><br /><br />Most conversations about indexing in Cassandra are about secondary indexes.&nbsp; This begs the question, what is the primary index?&nbsp; Your primary index is the index of your row keys.&nbsp; There isn't a central master index of all the keys in the database, each node in the cluster maintains an index of the rows it contains.&nbsp; This is what the Partitioner in Cassandra manages, as it decides where in a cluster of nodes to store your row.&nbsp; Because of this, the index typically only enables basic looking up of rows by key, much like a hashtable.&nbsp; The discussions that break out about the OrderPreservingPartitioner versus the RandomPartioner are really about how literally the primary index behaves like a hashtable versus an ordered map (i.e. something you could do a "select * from foo order by id" against).&nbsp; This is because, in the case of the RandomPartioner, you can't easily traverse your set of row keys in meaningful ways since the sorted order of those keys is assigned by Cassandra based on a hashing algorithm.&nbsp; The OrderPreservingPartitioner, as the name implies, orders the keys in string-sort order, so you can not only look up a row via a specific key, but can also traverse your set of keys in ways that are directly related to the values you are using as your keys.&nbsp; In other words, if your row key was a "lastname,firstname,ss#" string, you could iterate through your keys in alphabetical order by lastname.&nbsp; Generally, though, people try to use the RandomPartioner because, in exchange for the convenience of the OrderPreservingPartitioner, you lose the even distribution of your data across the set of nodes in your overall system, which impedes the scalability of Cassandra.&nbsp; For more understanding of this, I'd recommend reading <a href="http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/">Cassandra: RandomPartitioner vs OrderPreservingPartitioner</a>.<br /><br /><b>Alternate Indexes</b><br /><br />By definition, any other way of finding your row other than using the row key, makes use of a secondary index.&nbsp; Cassandra uses the term "secondary index" to refer to the specific built-in functionality that was added to version 0.7 for specifying columns for Cassandra to index upon, so we're going to use the broader term "alternate index" to refer to both Cassandra's native secondary indexes as well as other techniques for creating indexes in Cassandra.<br /><br /> ]]>
        <![CDATA[<b>Cassandra's Native Secondary Indexes</b><br /><br />For people new to Cassandra, this should be your starting point.&nbsp; Often, in discussions on the mailing list, people get deep into other forms of alternate indexes and the newcomer to Cassandra quickly becomes bewildered on what they should be doing.&nbsp; So, if you're 30 days into using Cassandra, and you need to quickly find a way to index your data, get started with the native secondary indexes.&nbsp; You might want to read the next sections, especially to to understand what "wide rows" are how how they relate to indexing, but don't start your implementation with these techniques.&nbsp; You can learn more about native secondary indexes on the <a href="http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes">blog page where they were initially announced</a> and on the <a href="http://www.datastax.com/docs/0.7/data_model/secondary_indexes">Datastax documentation page</a> for them.<br /><br /><b>"Wide rows" and CF-based Indexes</b><br /><br />The basic property of Cassandra that a lot of new users just blow past is the fact that it's a column-oriented database.&nbsp; They quickly see that they can create a column family (CF) as an analogue to a traditional database table and so create, for example, a CF called "Users" and the columns in each row in that CF are effectively used the same way that they'd use any database column, storing fields like "name", "location", etc.&nbsp; The fact that you don't have to specify the columns ahead of time is regarded as a convenience feature, no more "alter table" statements.&nbsp; These are certainly a big part of how you'll use Cassandra.&nbsp; However, the point of Cassandra, at least in the context of discussing indexing techniques, isn't whether you can fit 10 columns or 20 in a row.&nbsp; Cassandra can fit 2 billion columns in a single row.&nbsp; Why do you need 2 billion columns?&nbsp; One of the primary reasons is to have columns that point to other rows, and in that context, you might not need 2 billion rows, but you could end up using quite a few.<br /><br />The most basic form of CF Index isn't truly an index at all, it's list of keys.&nbsp; It's very common to store a set of row keys as column names in a way that's called a "wide row", because it consists of a large number of columns that each contain a small piece of data (i.e. a row key).&nbsp; Here's a simple example.&nbsp; Let's suppose you've created the aforementioned Users CF.&nbsp; Now you want to organize those users into groups.&nbsp; You could go and create a classic join table, but the more efficient Cassandra way is to have a Groups CF where every row contained the set of keys for the users in that group.&nbsp; So, if you wanted to quickly retrieve all the users in a group, you could just load that the contents of that row.&nbsp; Unlike the primary index, which is only ordered if the partitioner supports it, the column names in a row are always ordered, according to the criteria you used when you created the CF.&nbsp; If the user's key was their "lastname,firstname,ss#" as mentioned previously, and your Groups CF was sorting columns as strings, then when you retrieved your columns from the row, they would be in that sort order.&nbsp; If you wanted to get a range, say "last names starting with the letter 'a'", that would easily be accomplished as well.&nbsp; Another example, suppose for each user, you have a set of Tweets and the key for each tweet was a time-based UUID, then you'd create a CF called UserTweets where the key to that CF was the user's key, and each row in the UserTweets CF would contain the time-ordered set of keys to the tweets in your Tweet CF.&nbsp; When you look around the various sample projects, like <a href="https://github.com/ericflo/twissandra">Twissandra</a>, you see variants of this used quite a bit.&nbsp; It's the basic way relationships are typically modelled in the system.<br /><br />Now, the challenge becomes what if you want to use a CF row to contain a sorted set of something other than the target row's keys, such as the "lastname" column of the rows in the User CF.&nbsp; In that case, several users might have the same last name.&nbsp; What if, out of necessity, your User row keys are random UUID's?&nbsp; Obviously, the easiest answer is to go use native secondary indexing.&nbsp; However, there are a couple of good reasons why you might find that approach wanting.&nbsp; First of all, there are still limitations on the types of searching you can do with native indexes, particularly in regards to range queries, although these will be worked through over time.&nbsp; Indexes aren't just used for searching, though, they're used for sorting as well, and native secondary indexes don't yet help with that either.  The final issue is the idea of an indexed collection, which I'm going to talk about a bit later, but is important when you want to do indexing across many-to-many relationships.<br /><br /><b>Inverted-indexes Using Composite Column Names</b><br /><br />Building an inverted index inside a row is not that hard.&nbsp; Essentially you want to use the column sort ordering of the CF and make each column name in a row start with the value you want to search against.&nbsp; For example, let's suppose each column name was the user's last name and the column value was their user id.&nbsp; It would be very easy to search by last name by using Cassandra's get_slice method.&nbsp; Where it gets more complicated is that typically you have multiple target entities that have the same column value.&nbsp; For example, multiple users with the same last name.&nbsp; In this case, you have two options, either composite column names or supercolumns.&nbsp; A composite column name is where you essentially combine the indexed value with the target entity id.&nbsp; For example, the lastname and the user id as a single value that's used as the column name.&nbsp; Since the combination of the last name and the user id is unique, you can have multiple index entries for the same last name.&nbsp; To construct that composite column name, there are several techniques.&nbsp; The simplest, although not the most efficient or elegant approach, is string concatenation.&nbsp; I'm going to use this approach for my examples because it's the easiest to illustrate.&nbsp; If you take the string concatenation approach then, ideally, you'd want all of the search term component of the column name to be the same length, so perhaps you'd decide that you were only indexing on the first 10 characters of the user's last name, and you truncate lastnames longer than that or pad with spaces lastname shorter than that before concatenating the user id to form the index entry.&nbsp; So, you'd end up with, for example, a column name like "smith_____:0001" for user "Bob Smith" and "smith_____:0005" for "John Smith".&nbsp; Doing a get_slice against these column names would let you find all the users whose last name was "smith" pretty quickly.&nbsp; Your row would end up looking like this:<br /><br /><code>"indexkey":<br />&nbsp;&nbsp; &nbsp;"smith_____:0001" : null<br />&nbsp;&nbsp; &nbsp;"smith_____:0005" : null<br />&nbsp;&nbsp; &nbsp;"wilson____:0003" : null<br /></code><br />In this case, I'm storing null as the value for each column, but you could put whatever you wanted in there, for example, the user's full name, so that you didn't later have to grab that from the user's row.&nbsp; Note that "indexkey" is the row key, since this entire index sits inside a single row in the CF.&nbsp; You'll undoubtedly have at least a few different indexes in your app, each one in a separate row.<br /><br />If you choose to take the composite index approach, you'll want to <a href="http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html">read my previous post</a> and look at the <a href="https://github.com/edanuff/CassandraCompositeType">CassandraCompositeType</a> project that I've put on GitHub as a more elegant and flexible alternative to string concatenation, allowing any number of different value types, such as string, longs, or UUIDs, to be combined together into a single comparable value.&nbsp; Even if you decide not to use it, you'll probably want to consider some similar method of encoding your column names.<br /><br /><b>Inverted-indexes Using Supercolumns</b><br /><br />The other approach to inverted indexes is to use super columns.&nbsp; This is the approach that <a href="https://github.com/tjake/Lucandra/">Solandra</a> takes, for example.&nbsp; In this case, you'd use a super column where the super column name was the search term and the sub columns were the target entity ids.&nbsp; Something like this:<br /><br /><code>"indexkey":<br />&nbsp;&nbsp; &nbsp;"smith":<br />&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;"0001" : null<br />&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;"0005" : null<br />&nbsp;&nbsp; &nbsp;"wilson" :<br />&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;"0003" : null<br /></code><br />Again, note that you can put whatever you want in the subcolumn values.<br /><br />Using super columns is a fairly clean approach and is pretty much what they were designed for.&nbsp; There are a couple of reasons why people avoid supercolumns, the subcolumns in a supercolumn aren't sorted, and there's some question about how well they're supported in general, although that's a subjective concern.&nbsp; Also, if you wanted to create an index of something like "lastname,city", you'd still end up falling back to using composite column names (i.e. "smith_____new york__" as the supercolumn name).<br /><br /><b>Indexed Collections</b><br /><br />One interesting thing to note is that when building CF indexes, the index itself has a key.&nbsp; This makes it possible to create indexed collections.&nbsp; This is where you're indexing an entity in the context of another entity that owns a collection of which the first entity is a member of.&nbsp; For example, searching for all the users in group A by their last name.&nbsp; If it's a one-to-many relationship and a user can only be in one group, then that's as easy as adding an indexed "group id" column to your Users CF if you're using native secondary indexes.&nbsp; However, if that's not the case, if users can be in multiple groups, for example, then you might want to be able to maintain multiple "micro-indexes" rather than the global index that native secondary indexes manage.&nbsp; Yes, you'll need to maintain those multiple indexes whenever the value in the target entity changes, for example, updating the seperate index of user last names that each group maintains, but that's something that's actually easier than it sounds.&nbsp; You can see one technique for managing that within my <a href="https://github.com/edanuff/CassandraIndexedCollections">CassandraIndexedCollections</a> project and is something I discussed in my <a href="http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html">original blog post on Cassandra indexes</a>.<br /><br /><b>That Annoying Transactional Gotcha</b><br /><br />With all of these indexing approaches, you start to raise questions about how transactions fit into this.&nbsp; I'd recommend reading <a href="http://www.cidrdb.org/cidr2007/papers/cidr07p15.pdf">Life Beyond Distributed Transactions</a> for a good discussion of some of the specific issues related to alternate indexes and transactions.<br /><br />Hopefully this round-up provides some guidance on thinking about indexing within Cassandra-based applications.<br /><br /><br />]]>
    </content>
</entry>

<entry>
    <title>New patent</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2010/09/new-patent.html" />
    <id>tag:www.anuff.com,2010://1.29</id>

    <published>2010-09-27T17:22:59Z</published>
    <updated>2010-09-27T17:34:53Z</updated>

    <summary>Patent 7,801,990 from back in 2001, finally granted nearly 10 years later....</summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[<a href="http://www.freepatentsonline.com/7801990.html">Patent 7,801,990</a> from back in 2001, finally granted nearly 10 years later.<br /><br /><br /><form class="mt-enclosure mt-enclosure-image" style="display: inline;" contenteditable="false"><img alt="7801990.png" src="http://www.anuff.com/images/7801990.png" class="mt-image-center" style="text-align: center; display: block; margin: 0pt auto 20px;" height="399" width="472" /></form><br /><div><br /></div>]]>
        
    </content>
</entry>

<entry>
    <title>The alphabet according to Google Instant Search</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2010/09/the-alphabet-according-to-google-instant-search.html" />
    <id>tag:www.anuff.com,2010://1.28</id>

    <published>2010-09-08T22:23:03Z</published>
    <updated>2010-09-08T22:24:34Z</updated>

    <summary>Done with a clear cache and VPN&apos;d into a couple of different locations, but hardly an exhaustive test:A is for Amazon, B is for Bank of America, C is for Craigslist, D is for DMV, E is for eBay, F...</summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[Done with a clear cache and VPN'd into a couple of different locations, but hardly an exhaustive test:<br /><br />A is for Amazon, B is for Bank of America, C is for Craigslist, D is for DMV, E is for eBay, F is Facebook, G is for GMail, H is for Hotmail, I is for Ikea, J is for Jet Blue, K is for Kaiser, L is for Lowes, M is for Mapquest, N is for Netflix, O is for Outside Lands, P is for Pandora, Q is for Quotes, R is for REI, S is for Skype, T is for Target, U is for USPS, W is for Weather, X is for XBox, Y is for Yahoo, and Z is for Zillow.<br /><br />Some of these appear to be location dependent, VPN'ing into a server in Los Angeles gives me KTLA instead of Kaiser, Lakers instead of Lowes, Myspace instead of Mapquest, OC Fair instead of Outside Lands.<br /> ]]>
        
    </content>
</entry>

<entry>
    <title>Why I&apos;m messing around with NoSQL</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2010/09/why-im-messing-around-with-nosql.html" />
    <id>tag:www.anuff.com,2010://1.27</id>

    <published>2010-09-08T00:35:49Z</published>
    <updated>2011-03-02T06:44:02Z</updated>

    <summary>It seems like most of my friends and colleagues have heard I&apos;ve been using Cassandra in my current project and they forward on to me every blog post or tweet where someone has something negative to say about the Apache...</summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[It seems like most of my friends and colleagues have heard I've been using <a href="http://cassandra.apache.org/">Cassandra</a> in my current project and they forward on to me every blog post or tweet where someone has something negative to say about the Apache NoSQL open source database.&nbsp; To be clear, it's debatable whether you should use Cassandra in production today, although at the recent <a href="http://www.riptano.com/blog/cassandra-summit-recap">Cassandra Summit</a>, it was clear a lot of people were, and were having success with it, and of course, as we all know from the blog headlines, a <a href="http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-door/">few people are not</a>.&nbsp; But, I also think that it's probably useful to keep in mind how many of the basic building blocks of the web started on very shaky ground and went through many iterations before getting to where they are today.&nbsp; I'm not exactly talking about the <a href="http://en.wikipedia.org/wiki/Hype_cycle">Gartner Hype Curve</a>, because it's very hard to apply and very hard to actually determine where you are on that curve until years after the fact.&nbsp; Having ridden that curve on a number of web technologies, it's not as simple as drawing a sloppy sideways S-curve and saying "here we are".&nbsp; The reality is that there are a number of little peaks and valleys inside the overall process. <br /> ]]>
        <![CDATA[ I remember in 1995 being part of a consulting project at a Silicon 
Valley software company, implementing an intranet for their salesforce, 
where at the start of the project, the top IT people at the company were
 scoffing at the idea of using web technologies over Lotus Notes, 
because it was inconceivable to use something that didn't have 
store-and-forward for offline access.&nbsp; Six months later, the debate was 
whether to use Netscape Enterprise Server or *sneer* Apache, which was 
just <a href="http://en.wikipedia.org/wiki/Apache_HTTP_Server#History_and_name">"a patchy"</a> version of the NCSA web server.&nbsp; In 1997, at 
Wired/HotWired, the idea of building server applications in Java was 
something to be scoffed at.&nbsp; We were <a href="http://news.cnet.com/Wired-dreams-of-500-channels/2100-1023_3-269968.html">working with Marimba</a> at the time, building our <a href="http://www.wired.com/wired/archive/5.03/ff_push.html">"push media strategy</a>", as well as trying to roll out a chat service built on Java.&nbsp; Our best engineers 
were highly skeptical, and eventually, for a variety of reasons, both technical and business related, we discontinued most of our Java projects.&nbsp; There wasn't a tech blog scene like there is 
today, but if it had existed, the headlines and tweets would have been 
"Wired abandons Java for its sites, all web apps will be in Perl".&nbsp; Subsequently, our push content would go exclusively through, um, Pointcast.&nbsp; Oh, 
and, by the way, all using Sybase.<br /><br />Around the same time, "crazy" people were 
starting to talk about XML for content syndication.&nbsp; I exchanged an 
email with Dave Winer where he said XML was too heavyweight and "BigCo".&nbsp; 
Luckily, he <a href="http://www.scripting.com/davenet/1997/12/13/xml.html">changed his mind </a>on that, to the subsequent benefit of the 
blogosphere.&nbsp; Two years later, at Epicentric, we decided to make a very 
risky decision to abandon a bunch of work we'd done on a Microsoft IIS 
ASP-based web app and throw everything at Java and JSP.&nbsp; The JSP 
specification wasn't even close to being finalized and the only 
implementation of it was something called GnuJSP, that had been written by a lone developer in Europe.&nbsp; Our friends who were 
at other Internet companies, busily keeping sites in production (this 
was the first Internet boom), were predictably skeptical.&nbsp; A few years 
after that, J2EE was in high gear, all enterprise web apps were in JSP, and we were doing tens of millions a year in business.&nbsp; Now, at the time, for the most part, everything ran on top of an Oracle database.&nbsp; We were
 hearing from really good engineers and IT people (not the stodgy IT 
late adopters we all love to hate) about how big a piece of s*** MySql
 was, and that we should steer clear of it.&nbsp; That attitude did a 180 turnaround pretty quick, mostly because of price, not performance.&nbsp; We got 
bought by Vignette, where they were making the risky move to serving web
 content dynamically, direct from the database.&nbsp; Existing customers 
*hated* this idea, much too risky.&nbsp; Conventional wisdom said you had to 
generate static pages and write them to files, which could be served much faster than being dependent on a flaky database connection.&nbsp; Vignette never really made that transition, they were too closely associated with static publishing.&nbsp; At WidgetBox we used 
Spring and Hibernate instead of J2EE.&nbsp; This was well before Spring went 
mainstream, making SpringSource worth close to half a billion dollars and signaling the final death knell of EJB.<br /><br />Now, hopefully you haven't been bored by this trip down memory lane, or 
decided that I'm stuck in the past, which is quite possible.&nbsp; The thing 
is, with every one of these choices, by the time we were hitting our 
stride in the market, that which was risky had become mainstream.&nbsp; In 
the portal market, our main competitor, Plumtree, had stuck to building 
on Microsoft ASP and found that just when the market was taking off, they had to 
start figuring out how to move over to Java.&nbsp; Now, they did a pretty 
good job with what they had, they were always better at sales and 
marketing, but if we hadn't had the leading pure Java portal server, 
we'd have been out of the game rather than competing hard with BEA and 
IBM.&nbsp; These decisions are still all about placing a bet, and you can lose
 that bet, for example, choosing Perl, TCL, or Ruby when the market goes to 
PHP or whatever, so I'm not saying it's easy, and it takes a very objective but 
forward looking outlook, not just picking your favorite pet language.&nbsp; In fact, *especially* not just picking your favorite pet language, because there are some scary expense curves involved in that, which I'll talk about some other time.<br /><br />Now, and this is very important, if you're 
building a consumer Internet web app, you probably should be cobbling it
 together with throwaway technology, because that's not your key value, 
but if you're trying to deliver some form of infrastructure, if you're 
not at the raw bleeding edge when you start, then it's almost guaranteed
 that you're not going to have enough of a technology runway to still be
 ahead of the pack when you go to market.&nbsp; And that's why I'm tinkering 
with Cassandra and NoSQL, and, when the people who are the most 
involved in keeping today's technology up and running sneer the loudest,
 why I know I'm on the right track.<br /><br />POSTSCRIPT: Great quote, found in O'Reilly's "<a href="http://oreilly.com/catalog/0636920010852">Cassandra: The Definitive Guide</a>", is Ray Kurzweil's statement "an invention has to make sense in the world in which it is finished, not the world in which it is started".<br /><br />]]>
    </content>
</entry>

<entry>
    <title>Thoughts on building mobile apps with web technologies</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2010/09/thoughts-on-building-mobile-apps-with-web-technologies.html" />
    <id>tag:www.anuff.com,2010://1.26</id>

    <published>2010-09-03T20:36:24Z</published>
    <updated>2010-09-04T23:49:30Z</updated>

    <summary><![CDATA[A friend of mine recently showed me his Facebook iPad application that he'd built and which was selling quite well in the App Store.&nbsp; Besides being a pretty cool app, one of the interesting things about it was that it...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[A friend of mine recently showed me his <a href="http://www.facebook.com/friendly.iPad">Facebook iPad application</a> that he'd built and which was selling quite well in the App Store.&nbsp; Besides being a pretty cool app, one of the interesting things about it was that it had been written primarily in Javascript and HTML5, with a small amount of native code basically wrapping it in a <a href="http://developer.apple.com/iphone/library/documentation/uikit/reference/UIWebView_Class/Reference/Reference.html">UIWebView</a>.&nbsp; It seems like this is a recent but growing trend among iPhone developers, since it's easier to write crash-free code in Javascript than in Objective-C, and the iPhone's WebKit browser has a lot of mechanisms for supporting things like touch interactions, and there are a growing number of companies (<a href="http://www.appcelerator.com/">Appcelerator</a>, <a href="http://www.sencha.com/">Sencha</a>) and projects (<a href="http://www.jqtouch.com/">JQTouch</a>) trying to make this easier.&nbsp; This is all good news, and if this trend continues, it means the number of mobile applications will continue to explode.&nbsp; I'm a lot more skeptical about any of this making the App Store any less important, building apps was only ever half the battle...<br /><br /><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="free_as_in_paid.png" src="http://www.anuff.com/images/free_as_in_paid.png" class="mt-image-center" style="text-align: center; display: block; margin: 0pt auto 20px;" height="346" width="194" /><div style="text-align: center;"><em>This joke never gets old</em></div></span><br />It all comes down to packaging and distribution, of course.&nbsp; App stores never go away, even if they ultimately become, under the hood, a way to sell password-protected, pay-to-open, web bookmarks.&nbsp; And that's not a bad thing, because for all the headaches of dealing with opaque approval processes and such, at least they've figured out how people get paid.]]>
        
    </content>
</entry>

<entry>
    <title>&quot;Meet the new boss, same as the old boss&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2010/08/meet-the-new-boss-same-as-the-old-boss.html" />
    <id>tag:www.anuff.com,2010://1.25</id>

    <published>2010-08-13T23:03:35Z</published>
    <updated>2010-08-13T23:05:36Z</updated>

    <summary><![CDATA[Lots of people are getting into the weeds of this Oracle/Google/Java spat, it really is little more than a thinly veiled shakedown gambit.&nbsp; But when I look at it as the latest in a string of well publicized disputes between...]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[Lots of people are getting into the weeds of this Oracle/Google/Java spat, it really is little more than a thinly veiled shakedown gambit.&nbsp; But when I look at it as the latest in a string of well publicized disputes between virtually every single major platform owner today and the developers trying to build on those platforms, as well as the major conflicts between potentially competitive platforms, I'm more concerned with the fact that we've recently moved into a new era of aggressiveness and heavy handed behavior by platform owners that we haven't seen since the early 90's.&nbsp; I used to suspect that many of the companies that were the most vocal in decrying Microsoft's dominance back in the day would have behaved no differently than Microsoft if they'd had the ability to do so.&nbsp; Now, when I take a look at the way that every single platform owner of any significance is behaving, I realize that I was wrong, most of them would have behaved far worse.<br /><br />Note: I'm not using the term "platform" in the way that every company with an API puffs up their chest and tries to claim, but to mean that the company and it's technology have a meaningful ecosystem with a large base of third party vendors, partners, developers, and other participants, all of whom are earning a living (or at least trying to) on top of it.&nbsp; Platforms are ultimately markets, not technologies. ]]>
        
    </content>
</entry>

<entry>
    <title>Death Grip</title>
    <link rel="alternate" type="text/html" href="http://www.anuff.com/2010/07/death-grip.html" />
    <id>tag:www.anuff.com,2010://1.24</id>

    <published>2010-07-19T01:45:13Z</published>
    <updated>2010-07-19T01:50:58Z</updated>

    <summary><![CDATA[ Should I return it and get a new one or just use the bumper?&nbsp; Still need to install iOS 4.1......]]></summary>
    <author>
        <name>Ed Anuff</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.anuff.com/">
        <![CDATA[<span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="death_grip.png" src="http://www.anuff.com/images/death_grip.png" class="mt-image-center" style="text-align: center; display: block; margin: 0pt auto 20px;" height="320" width="643" /></span> <div>Should I return it and get a new one or just use the bumper?&nbsp; Still need to install iOS 4.1...<br /><br /></div>]]>
        
    </content>
</entry>

</feed>

