"; */ ?>

Posts Tagged: time-series


11
Dec 15

Super Powers and Their Mutable Friends

After releasing my bullet proof time series database most of the world’s high frequency companies started converting to it. In less than a day major Fortune 7.3 billion players adopted their solutions and embraced the simplicity and greatness of what my Clojure time series database delivered.

So what now? When all the money is made and the adoption rate is higher than I could ever predicted.. What now? Well, now it’s time to fix it, because it’s, um, broken.

Keys to Time


Here is the data example for the current broken solution:

(def events
  {1449088877092 {:GOOG {:bid 762.74 :offer 762.79}}
   1449088876590 {:AAPL {:bid 116.60 :offer 116.70}}
   1449088877601 {:MSFT {:bid 55.22 :offer 55.27}}
   1449088877203 {:TSLA {:bid 232.57 :offer 232.72}}
   1449088875914 {:NFLX {:bid 128.95 :offer 129.05}}
   1449088870005 {:FB {:bid 105.96 :offer 106.6}}})

It is a map: say we have a couple of events coming in at the exact same millisecond:

(def events [
  {:ts 1449088877203 :ticker :GOOG :event-id 1}    ;; <<
  {:ts 1449088876590 :ticker :AAPL :event-id 2}
  {:ts 1449088877601 :ticker :MSFT :event-id 3}
  {:ts 1449088877203 :ticker :TSLA :event-id 4}    ;; <<
  {:ts 1449088875914 :ticker :NFLX :event-id 5}
  {:ts 1449088870005 :ticker :FB   :event-id 6}])

Notice that Tesla and Google have the same timestamp. So the (sorted-map-by) would not work here, as it would re assoc them. Of course a custom comparator can be used that will not treat “the same keys as the same”, but then there is a problem with key collisions.

Natural Numbers


So here I present to you a massively refactored solution with its codebase experiencing a two fold increase. The one and only: “The Time Series Database in One Line of Clojure 2.0”, or simply “The Time Series Database in 2.0 Lines of Clojure”.

I’ll format the first line for a better readability:

(defn ts [{t1 :ts} {t2 :ts}] 
  (if-not (= t1 t2) 
    (compare t1 t2)
    1))

This is a simple comparator with a twist: when it sees two timestamps that are the same, it lies.

Now on to the second line, a “database codebase conclusion”, as I call it:

(def db (sorted-set-by ts))

And.. done.

Action!


Some tools and queries from a previous 1.0 product:

;; database with data
(defn with [db data] (reduce conj db data))
 
;; find data before a timestamp
(defn before [db ts] (subseq db <= {:ts ts}))
 
;; find data after a timestamp
(defn after [db ts] (subseq db >= {:ts ts}))

Let’s look at the database with data:

=> (with db events)
 
#{{:ts 1449088870005, :ticker :FB, :event-id 6}
  {:ts 1449088875914, :ticker :NFLX, :event-id 5}
  {:ts 1449088876590, :ticker :AAPL, :event-id 2}
  {:ts 1449088877203, :ticker :GOOG, :event-id 1}  ;; << same
  {:ts 1449088877203, :ticker :TSLA, :event-id 4}  ;; << timestamp
  {:ts 1449088877601, :ticker :MSFT, :event-id 3}}

slicing and dicing:

(before (with db events) 1449088876592)
 
({:ts 1449088870005, :ticker :FB, :event-id 6} 
 {:ts 1449088875914, :ticker :NFLX, :event-id 5} 
 {:ts 1449088876590, :ticker :AAPL, :event-id 2})
(after (with db events) 1449088876592)
 
({:ts 1449088877203, :ticker :GOOG, :event-id 1} 
 {:ts 1449088877203, :ticker :TSLA, :event-id 4} 
 {:ts 1449088877601, :ticker :MSFT, :event-id 3})

Super Hero Friends


While it is nice to be able to slice a sorted set with a lying comparator, at times, it may not be desirable to do so.

But every super hero has a true friend. Spiderman, for instance, has many. So does “The Time Series Database in 2.0 Lines of Clojure”. The friend’s name is multim and it’s also a Super.


2
Dec 15

Time Series Database in One Line of Clojure

If you ever worked in the financial sector, specifically high frequency trading, a time series database is a well known tool that orders up all those quotes, orders, trades for financial pleasure.

The are many of these databases available. The Wall Street being The Wall Street would of course primarily use proprietary ones, since, well, it’s proprietary :), but giving them a credit: they do outperform open source ones by a lot, at least presently (talking about millions per second).

Disrupting Time Series Business


So I decided to write an open source time series database that will outperform them all not necessarily by performance, but definitely by clarity and size. Get ready for this one line.

If you read this far that means you are ready, so let’s begin by creating a database:

(def db (sorted-map-by >))

Oh, by the way we are done. It’s the one and only: The Time Series Database.

Map is King of Data


Let’s use it. First we’ll need some data:

(def data
  {1449088877092 {:GOOG {:bid 762.74 :offer 762.79}}
   1449088876590 {:AAPL {:bid 116.60 :offer 116.70}}
   1449088877601 {:MSFT {:bid 55.22 :offer 55.27}}
   1449088877203 {:TSLA {:bid 232.57 :offer 232.72}}
   1449088875914 {:NFLX {:bid 128.95 :offer 129.05}}
   1449088870005 {:FB {:bid 105.96 :offer 106.6}}})

The format is simple {timestamp data}.

Now a query to have a database as a value with this data:

(defn with [db data] (merge db data))

And finally some time based queries, like before and after:

(defn before [database ts] (into {} (subseq database > ts)))
(defn after [database ts] (into {} (subseq database < ts)))

done.

Action!


(before (with db data) 1449088877091)
 
{1449088876590 {:AAPL {:bid 116.6, :offer 116.7}},
 1449088875914 {:NFLX {:bid 128.95, :offer 129.05}},
 1449088870005 {:FB {:bid 105.96, :offer 106.6}}}
(after (with db data) 1449088877091)
 
{1449088877601 {:MSFT {:bid 55.22, :offer 55.27}},
 1449088877203 {:TSLA {:bid 232.57, :offer 232.72}},
 1449088877092 {:GOOG {:bid 762.74, :offer 762.79}}}

Beware, you, other time series databases!

P.S. Of course there is a possibility of events that came in at the exact same millisecond, so here is another line that solves it