Socializing

Permalink

Thoughts on “Big Data”

I came across a recent Quora thread I thought I would share and comment on here: Q: Why the Current Obsession with Big Data? There are quite a few thoughtful responses from the data community including LinkedIn’s Peter Skomoroch and Cloudera’s Jeff Hammerbacher (formerly a Facebooker). Peter and Jeff think big data is interesting because we now have larger and more complete samples from which to work with and, thanks to declining storage costs, we’ll see the data go further back in time. Roger Ehrenberg who recently closed a new fund to invest in data startups, thinks that big data is only interesting with the ability to analyze and make actionable decisions from that data in a cost effective manner. This analysis is being made easier with open source technologies like Hadoop. We at Infochimps agree with Peter, Jeff and also Roger. Our marketplace exists to inform us where demand is, so that we can pre-compute or otherwise create data products from the massive amount of data we have and distribute them at a low cost to our customers. We believe the key to staying ahead of big data is understanding what questions are being asked of the data. When you know the questions you can figure out the answer. Otherwise, big data is just a liability and not an asset.

Permalink
Permalink
The Access Economy
I acknowledge that I am in the minority of those who think we’re not in a bubble. I was reminded of this yesterday when AOL bought about.me 1 year and 16 days after the company was formed and 4 days after it was launched. Today I am sure I will catch more flak after the announcement that Twitter acquired Fluther. What is Fluther you might ask? Well, it is a Q&A site, though one admittedly I had not heard of. The similarity between about.me and Fluther is that both sites facilitate access to information. I think the theme for the next few years will be around access. This is the fundamental thesis of Infochimps. We help users find the information they are looking for and access it in a way that makes sense for them. Some folks want access through an API and others want to pull up a spreadsheet in Excel. Similarly, about.me makes it easy to find information about people (in whatever way, within bounds, that a person wants to make such information available), and Fluther makes it easy to find answers to questions (ask anything and someone will answer you). That’s my take on it, and I reaffirm my stance that we’re not in a bubble.

The Access Economy

I acknowledge that I am in the minority of those who think we’re not in a bubble. I was reminded of this yesterday when AOL bought about.me 1 year and 16 days after the company was formed and 4 days after it was launched. Today I am sure I will catch more flak after the announcement that Twitter acquired Fluther. What is Fluther you might ask? Well, it is a Q&A site, though one admittedly I had not heard of. The similarity between about.me and Fluther is that both sites facilitate access to information. I think the theme for the next few years will be around access. This is the fundamental thesis of Infochimps. We help users find the information they are looking for and access it in a way that makes sense for them. Some folks want access through an API and others want to pull up a spreadsheet in Excel. Similarly, about.me makes it easy to find information about people (in whatever way, within bounds, that a person wants to make such information available), and Fluther makes it easy to find answers to questions (ask anything and someone will answer you). That’s my take on it, and I reaffirm my stance that we’re not in a bubble.

Permalink

We’re Not in a Tech Bubble

Following the recent buzz around the Google/Groupon rumor, the NY Times is shouting fire in a crowded theater but I agree with the SF Chronicle who says nish nish. Being based in Austin and spending a lot of time on both coasts, I feel like I’ve heard a fairly representative sample of opinions on this topic. The folks who think we’re in a tech bubble don’t get what’s going on with the distributed web.

Technologies like hadoop and cassandra will help get the enterprise to move their data to the cloud. Once it is there, that will open up an entirely new data economy, which won’t just benefit Infochimps, but will empower every web-enabled business to make informed BI decisions using actionable data.

Additionally, there is a huge opportunity in syncing devices and seamlessly sharing content. It has been a pipe dream until now to have your phone talk to your laptop which talks to your tv. Chrome OS is a big bet by Google that this gap is narrowing. Netbooks with Chrome OS will not have a hard drive. Read this article for what this means for the music industry.

AWS, Salesforce (+ Heroku) and other cloud platforms to share infrastructure make it easy and cheap to build a web startup. We run a big data stack on AWS for the cost of hiring a single DBA (which thanks to AWS we don’t need). This means that innovation at the early stages is going to occur at an unprecedented pace.

I’ve placed my bet…

Permalink
DFJ CEO Summit

I brought home some hardware from this year’s DFJ CEO Summit in Half Moon Bay. Over 125 CEOs from all of the DFJ portfolios were in attendance, including our new friends at Loongstore from DFJ China who plan to give Hadoop a run for their money. The conference started with an impassioned speech by DFJ’s fearless leader, Tim Draper. He encouraged us to ask people stuff, don’t run out of money, eliminate human friction, focus, solve a big problem, have fun and partner whenever possible. Most importantly he reminded each of us in the room to remember that we are a meme and to lead by example. We also learned that “over time things move towards commoditization” from the Director of AWS, that “it’s all about the follow” from the VP Business and Corporate Development of Twitter, and that “there will be 53 zettabytes of data on the web by 2020” from the Director of New Business Development of Google. It was a first class event in every way and was another reminder of the value bringing in the right venture partner adds to a startup. Thanks DFJ!

DFJ CEO Summit

I brought home some hardware from this year’s DFJ CEO Summit in Half Moon Bay. Over 125 CEOs from all of the DFJ portfolios were in attendance, including our new friends at Loongstore from DFJ China who plan to give Hadoop a run for their money. The conference started with an impassioned speech by DFJ’s fearless leader, Tim Draper. He encouraged us to ask people stuff, don’t run out of money, eliminate human friction, focus, solve a big problem, have fun and partner whenever possible. Most importantly he reminded each of us in the room to remember that we are a meme and to lead by example. We also learned that “over time things move towards commoditization” from the Director of AWS, that “it’s all about the follow” from the VP Business and Corporate Development of Twitter, and that “there will be 53 zettabytes of data on the web by 2020” from the Director of New Business Development of Google. It was a first class event in every way and was another reminder of the value bringing in the right venture partner adds to a startup. Thanks DFJ!

Permalink

Mo Money Mo Problems

Since our funding news broke today (so much for embargoes…) I thought I’d share a few best practices I learned about fundraising along the way.

1) Leverage your network. It is easy to get face time with VCs but hard to get mind share. The best intros typically come from execs of their portfolio companies and other VCs. Initially ask for advice not money.

2) Start local. The west coast is the best coast for raising capital, but you should start by seeding the local market. If you’re from Austin and you march down Sand Hill Road before getting interest from the local funds, you’ll be embarrassed when they ask you, “What does [AV] think?” It helped a lot being able to say, “Ask them.”

3) Appear bigger than you are. I’m hesitant to say “fake it till you make it” but you need to plausibly believe you’re the next Google. Highlight the metrics that put you in the best light. For us that was traffic, page views and downloads, not revenue.

4) Sell your stock, not your product. Having worked on a few public offerings, I learned that you draft the prospectus with a focus on selling stock, not product. Investors need to understand whatever you are selling and how you’ll make money, but it is more important to convince them that the opportunity is to invest in your company.

5) Get to the right partner at the right time. An associate who loves you can be your best champion but make sure you move the conversation up to the partnership before you need the money. When a decision needs to be made in a compressed timeframe, it is often a “no”.

Permalink

A haiku to our angel investors

You believed in us

Thank you thank you thank you please

Send us your sig page

Permalink
It’s an exciting time in the Austin startup community. A lot of companies are deservedly getting buzz and/or funding, including Smooth-Stone, Riptano, and, ahem, Infochimps. A lot of these teams include folks who have been active in the Austin startup community for a while. There are also a number of new faces: transplants such as Susan Strausberg, and some who have repatriated such as Noah Kagan. I am excited to see what some experienced CEOs do next, such as Ed Roman and Gary Cowsert. Jason Cohen recently “grabbed his balls” and launched WP Engine. Carla Thompson and her Sharp Skirt meet ups prove that balls are not necessary.Many new startups are coming out of UT through Rob Adam’s class in the MBA program, such as Ordoro, and 3 day startup, such as FamiGo. Some of the increased startup activity is probably a product of UT’s marketing spend on encouraging entrepreneurship. UT has some great resources available to entreprenuers such as Gary Hoover, the school’s first EIR, and Texas Venture Labs.Meanwhile, downtown, Damon Clinkscales has picked up what Bryan Jones and I started a few years ago, OpenCoffee Club. There is no shortage of startup events. There are some rumblings that there might even be too many. I don’t share that sentiment. Choice is good. Attend what you want.As far as the “ditch the valley for the hills” thing goes, I was bummed when CheapTweet left for the bay area, but I’m happy for them. You’re kidding yourself if you think the opportunities in Austin are equivalent to those in the bay area. *Especially if you’re raising early stage institutional financing. That being said, it’s not Austin’s fault you can’t raise capital. Do you really have a venture backable business and team? It’s OK if you don’t. That’s what revenues are for.I am proud of the community that I’ve now been a part of for the better part of 5 years, and challenge my colleagues to dream big and make Austin even better. Yes, we need some more exits, but it’s no small feat that we’ve had a handful of companies go public here recently. IPOs are great for Austin, even if they aren’t huge wins for their founders or investors. They create more jobs and more startups. Your startup probably isn’t next (unless you’re Bazaarvoice or Homeaway) but if you stop blaming Austin and start working harder, you might make it happen some day. I’m rooting for you.

It’s an exciting time in the Austin startup community. A lot of companies are deservedly getting buzz and/or funding, including Smooth-Stone, Riptano, and, ahem, Infochimps. A lot of these teams include folks who have been active in the Austin startup community for a while. There are also a number of new faces: transplants such as Susan Strausberg, and some who have repatriated such as Noah Kagan. I am excited to see what some experienced CEOs do next, such as Ed Roman and Gary Cowsert. Jason Cohen recently “grabbed his balls” and launched WP Engine. Carla Thompson and her Sharp Skirt meet ups prove that balls are not necessary.

Many new startups are coming out of UT through Rob Adam’s class in the MBA program, such as Ordoro, and 3 day startup, such as FamiGo. Some of the increased startup activity is probably a product of UT’s marketing spend on encouraging entrepreneurship. UT has some great resources available to entreprenuers such as Gary Hoover, the school’s first EIR, and Texas Venture Labs.

Meanwhile, downtown, Damon Clinkscales has picked up what Bryan Jones and I started a few years ago, OpenCoffee Club. There is no shortage of startup events. There are some rumblings that there might even be too many. I don’t share that sentiment. Choice is good. Attend what you want.

As far as the “ditch the valley for the hills” thing goes, I was bummed when CheapTweet left for the bay area, but I’m happy for them. You’re kidding yourself if you think the opportunities in Austin are equivalent to those in the bay area. *Especially if you’re raising early stage institutional financing. That being said, it’s not Austin’s fault you can’t raise capital. Do you really have a venture backable business and team? It’s OK if you don’t. That’s what revenues are for.

I am proud of the community that I’ve now been a part of for the better part of 5 years, and challenge my colleagues to dream big and make Austin even better. Yes, we need some more exits, but it’s no small feat that we’ve had a handful of companies go public here recently. IPOs are great for Austin, even if they aren’t huge wins for their founders or investors. They create more jobs and more startups. Your startup probably isn’t next (unless you’re Bazaarvoice or Homeaway) but if you stop blaming Austin and start working harder, you might make it happen some day. I’m rooting for you.

Permalink
Permalink

Wolfram Data Summit

Our friends at Wolfram|Alpha were kind enough to invite us to their invite-only data love fest in Washington D.C., their inaugural Wolfram Data Summit. It was a veritable who’s who of the big data world, including heavy weights from big companies (i.e. Microsoft, D&B), government agencies (i.e. NASA, the Federal Reserve), and research institutions (i.e. Stanford).

The conference kicked off with a keynote from my colleague and host, Stephen Wolfram. His ambition with Wolfram|Alpha is to make all of the world’s data computable. Ours at Infochimps is to make all the world’s data accessible. You think there might be some synergies there? The most interesting part of Stephen’s speech was his announcement of a new file format: CDF (computable document format) which allows data to be computed (read: interacted with) on a web page. Stephen got some laughs when he told the audience he was a data enthusiast as evidenced by him having logged every keystroke for the last 20 years. Another interesting bit was that Wolfram|Alpha aggregates source information at the bottom of every report generated instead of detailing it because almost every computation is made across multiple data sets.

In the next breakout session I learned about the openlibary.org project (scanning the world’s books) and the Borgmann project (compiling a list of all words in the English lnaguage). Incredibly the most difficult aspect is not the technology but defining the parameters of the projects. What is a book (vs another form of publication)? What is a word (vs another verbal expression)? Erin McKean, CEO of our partner Wordnik, thinks a word is anything that can be played in scrabble and is unlikely to be challanged. We have a list of 350,000 words on Infochimps, but the Borgmann project will yield millions. I imagine we will get it on our site.

I wasn’t the only Austinite in attendance. Byron Reese, Chief Innovation Officer of Demand Media (and the guy who recruited my friend David Yehaskel to the company), spoke about the difference between data, knowledge and wisdom. According to Byron, data are observable and measurable facts, knowledge is the interpretation of data, and wisdom is the application of values to knowledge. Infochimps is making data accessible so the world can interpret it and become more knowledgable.

I spent most of lunch chatting with Derek Willis from the New York Times. He manages their APIs and joked that he prefers interviewing data as opposed to people because data doesn’t lie to his face. He got some laughs but I’m not sure I agree. When the attendees at the conference were surveyed how many people read product reviews online everyone raised their hand. But when we were surveyed who writes product reviews online only a handful of hands went up. The data is a product of how it’s collected and that is the problem with crowd sourcing.

US News talked about how searching data needs to get simpler, especially for their customers who are making once in a lifetime decisions (where to attend college). There was some debate as to how they calculated rankings. Some attendees believed US News should make the raw data available and build a widget for users to build their own rankings based on their own weightings of the various inputs.

The BBC expounded on ontologies as opposed to taxonomies for organizing data, which is a method of organizing linked data in structured ways. The speaker recommended the following as a guide: lexical analysis -> classification -> disambiguation -> relationship extraction. This has allowed the BBC to build dynamic web pages that don’t require human content managers. The key: keep your ontology simple.

Ed. note: I will add hyperlinks and my thoughts on day 2 soon.