The new newness

Been a while since I posted here. I spent 2014-2018 running data science and machine learning R&D programs at D2DCRC. Now I’ve founded a company tyto.ai with backing from H2 Ventures. I’m working on a number of interesting things, so watch that website for updates.

February 20, 2018 · 1 min · admin

Why question answering is hard

IBM recently semi-released their Watson APIs. Judging from the comments on the HN thread many don’t realize just how hard question answering (QA) is, nor just how good Watson is. I’ve spent a fair bit of time building my own QA system. As is often the way, that’s given me some insight into what a big problem space this is. To be clear: I think QA is a AI-Complete problem. Any system that works on a subset of QA is a good achievement. ...

May 11, 2015 · 7 min · admin

Interesting papers from NIPS 2014

NIPS is the premier conference on Deep Learning. Given the accelerating state of the art, it’s interesting to see what is new. The paper list is available from http://www.dlworkshop.org/accepted-papers. These are the papers that stood out to me (or at least matched my interests). cuDNN: Efficient Primitives for Deep Learning: A library from nVidia for deep learning on GPUs. ~36% speedup on training using a K40. Has Caffe integration (which has quickly become the standard Deep Learning library). ...

December 23, 2014 · 3 min · admin

5 Quick Links

I haven’t tried this yet, but the examples are very impressive. “We introduce a recursive neural network model that is able to correctly answer paragraph-length factoid questions from a trivia competition called quiz bowl. Our model is able to succeed where traditional approaches fail, particularly when questions contain very few words (e.g., named entities) indicative of the answer.” http://cs.umd.edu/~miyyer/qblearn/ IBM finally opening up the Watson system with an API. Allegedly the way to get access to this is via the BlueMix PAAS. https://developer.ibm.com/watson/docs/developing-watson-apis/ (I’ve tried this now. The Question Answering API is pretty much untrained and mostly gives bad results. However, its confidence scoring is very good, ie, if it gives a bad answer it will have a low confidence score, whereas answers with a 90%+ score are almost always right) The paper on Google’s Knowledge Vault. I thought I’d posted this already: http://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf DeepDive from Stanford. This used to be Wisci from UW-Madison. http://deepdive.stanford.edu/index.html. It does probabilistic inference on unstructured data. https://github.com/percyliang/sempre SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms.

December 22, 2014 · 1 min · admin

5 Quick Links

Actual, real guidance on how to secure Docker containers - what is possible and what isn’t. http://www.slideshare.net/jpetazzo/is-it-safe-to-run-applications-in-linux-containers Google building a fact base by extracting facts from the broad web: http://www.newscientist.com/article/mg22329832.700-googles-factchecking-bots-build-vast-knowledge-bank.html#.U_ctvbySxmM MIT Information Extraction: state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors. https://github.com/mit-nlp/MITIE http://googleresearch.blogspot.com/2014/09/building-deeper-understanding-of-images.html Labelling parts of images. The examples are pretty impressive. http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03620usen/GBE03620USEN.PDF IBM Whitepaper on their vision for using blockchains to “power” the Internet-of-Things. See also https://gigaom.com/2014/09/09/check-out-ibms-proposal-for-an-internet-of-things-architecture-using-bitcoins-block-chain-tech/

September 23, 2014 · 1 min · admin

Chrome Extension: Chrome Reload

I wrote my first Chrome extension. The APIs are pretty nice. From the project page: Chrome-Reload is an extension for the Chrome web browser to allow automatic periodic reloading of a page. It allows the you to configure how often each page reloads and see a count-down until the next load. This is useful for scenarios such as monitoring constantly changing pages (eg, search results), or for keeping sessions alive in web applications. ...

April 16, 2014 · 1 min · admin

4 quick links

Penn Treebank II tags: https://gist.github.com/nlothian/9240750. Because they aren’t actually documented anywhere except one person’s thesis. And now that is offline. Knowledge Extraction from text. Looks good, pity about the license: http://knowitall.github.io/openie/ AirBNB service discovery, including autoconfig for Docker/HAProxy https://github.com/airbnb/synapse#docker DNS with an API https://github.com/skynetservices/skydns

April 16, 2014 · 1 min · admin

5 quick links

BTSync on Ubuntu 12.04. Interesting, too bad BTSync isn’t open source. Dashing, from Shopify. Framework for attractive dashboards. Gridster. Gridster is a jQuery plugin that allows building intuitive draggable layouts from elements spanning multiple columns Prediction.io PredictionIO is an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery. MBox. Mbox is a lightweight sandboxing mechanism that any user can use without special privileges in commodity operating systems.

February 11, 2014 · 1 min · admin

4 quick links

Currently working on the most stupid idea I’ve ever had. It’s so dumb that it is pretty much guaranteed to fail. Publish & discover Docker services https://npmjs.org/package/docker-discover An slide deck on an implementation of IBM’s Watson. Relation Extraction with Relation Topics - from the Watson DeepQA team. ConvNetJS: Deep Learning in Javascript.

January 9, 2014 · 1 min · admin

5 Quick Links

BayesDB. Query the probable implications of your data as easily as a SQL database lets you query the data itself. eg: INFER salary FROM mytable WHERE age > 30; I think I just saw the future… Also ALPS is somewhat related, for Postgresql. Why Cognition-as-a-Service is the next operating system battlefield - something I’m interested in. A Programmer’s Guide to Data Mining - looks pretty good. BaseKB - cleaned up Freebase data Mission Control is here. Java profiling gets even better.

December 14, 2013 · 1 min · admin