Bhaskar Karambelkar's Blog

  • Re-plotting Russian AirStrikes In Syria

    My Cartography mentor Bob Rudis pointed me to a blog post visualizing Russian Air Strikes in Syria and commanded me to redo the static maps to something more interactive and easier to explore. TL;DR Version Interactive Map at Rpubs created using Leaflet after scraping data using RSelenium+ PhantomJS + dplyr. You can use the LayerSelector at the Top Right to toggle various Base Tiles. Clicking on any Marker will show details about that Air Strike.

  • Shiny in a SmartOS zone

    My Last post showed you how to install R inside a SmartOS zone. This post is about installing the shiny server in the said zone. While setting up R was relatively straight forward, for setting up Shiny server I had to patch some C++ code to make shiny server work on solaris. Which means you don’t have to, just follow along. First install R in a zone as shown in my earlier post.

  • Setting up R on a SmartOS Zone.

    Recently I converted a spare beefy laptop (8 cores, 16 GB RAM, 750GB HD) to a SmartOS hypervisor. I wanted to play with some bare metal hypervisor / container stuff and ESXi was just not cutting it. I’m not a Solaris nerd, but I know enough Unix to find may way around in Linux/*BSDs/Solaris/HP-UX, so it was not a big pain. In fact ZFS is really nice. Anyway, this post is about setting up R in a zone.

  • Redoing some Bad Data Viz.

    I saw the above graph in my Twitter feed. This beauty comes from Business Insider and was part of this article describing the misery in the world. There are so many wrong visualization elements here. So let’s see what they are and if we can fix them. Stacked Bar Chart are not useful when you have to compare the category which doesn’t align on an axis. In this case you can’t really compare the inflation values of each country because they don’t have a common baseline.

  • Introduction to NoSQL Databases

    Recently I was asked to make a small presentation to a Graduate level course on Databases about NoSQL Databases. Here are the slides for the same. The slides go over high level introduction to NoSQL Databases, What they are ? What are some of the characteristics and how they differ from traditional relation databsaes ? Their Pros and Cons and finally some examples of different types of NoSQL DBs.  

  • Video of my talk on Elasticsearch at Elastic{ON} 2015

    Back in March, 2015 I gave a talk at Elastic{ON}, 2015 on how to scale Elasticsearch for production scale data. Here’s a blog post on it and here’s the video of it. I got a lot of positive feedback from the community on the talk and it was personally a wonderful experience to share our story with the ever growing elasticsearch community. The opportunity to speak at a large user conference was beneficial for me tooa as it allowed me to sharpen my public speaking skills.

  • Book Review : Data Driven Security

     Disclosure I work with the two authors of this book. In fact one of them is my manager. But a) I don’t like to suck up to my colleagues and b) I’m sure they don’t like being sucked up to either. Despite this if you think my review will be biased then stop reading now. Go watch some cat videos. Data Driven Security is a first of it’s kind book that aims to achieve the impossible; To be a book that integrates all 3 dimensions of ‘Data Science’, a) Math and Statistical Knowledge, b) Coding/Hacking skills, and c) Domain Knowledge.

  • The 10 commandments for hiring Data Scientists

    As a Data Scientist (whatever it means), I get a lot of job offers over LinkedIn and other channels. Although I’m not actively looking for a job, I still go through them. One just because I’m curious to find out what exactly do organizations look for in a Data Scientist, and secondly to amuse myself. This post is about the later part, it amuses me to no end what some people want in a Data Scientist, and I’ve made a consolidated list for all the recruiters and organizations who are looking to hire one (or more).

  • Visualizing India v/s Pakistan One Day International Results

    This is my small effort to pickup streamgraph support in R developed by Bob Rudis. (Described here). What you see is per year aggregations of results of all India v/s Pakistan One day Internationals. I pulled the records from Wikipedia and used rvest by Hadley Wickham. for extracting the results. After that a little data munging using dplyr and lubridate and voilà. Blue’s are India and Green’s are Pakistan in accordance with their team colors.

  • How to use Twitter’s Search REST API most effectively.

    This blog post will discuss various techniques to use Twitter’s search REST API most effectively, given the constraints and limits of the said API. I’ll be using python for demonstration, but any native API which supports the Twitter REST API will do. Introduction Twitter provides the REST search api for searching tweets from Twitter’s search index. This is different than using the streaming filter API, in that the later is real-time and starts giving you results from the point of query, while the former is retrospective and will give you results from past, up to as far back as the search index goes (usually last 7 days).

© 2015 Bhaskar V. Karambelkar. All rights reserved.