From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-04-ewr.dyndns.com (mxout-062-ewr.mailhop.org [216.146.33.62]) by lists.bufferbloat.net (Postfix) with ESMTP id C3CCD2E07DF for ; Mon, 18 Apr 2011 05:46:31 -0700 (PDT) Received: from scan-01-ewr.mailhop.org (scanner [10.0.141.223]) by mail-04-ewr.dyndns.com (Postfix) with ESMTP id ABA837E429F for ; Mon, 18 Apr 2011 12:46:30 +0000 (UTC) X-Spam-Score: 0.0 () X-Mail-Handler: MailHop by DynDNS X-Originating-IP: 149.20.54.64 Received: from mainmail.teklibre.com (toutatis.isc.org [149.20.54.64]) by mail-04-ewr.dyndns.com (Postfix) with ESMTP id 6DDD67E3598 for ; Mon, 18 Apr 2011 12:46:26 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mainmail.teklibre.com (Postfix) with ESMTP id 055A23DA139; Mon, 18 Apr 2011 06:20:59 -0700 (PDT) Received: from mainmail.teklibre.com ([127.0.0.1]) by localhost (toutatis.sql1.isc.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VKlH+Dda6I0w; Mon, 18 Apr 2011 06:20:58 -0700 (PDT) Received: from [192.168.1.4] (c-68-39-173-20.hsd1.nj.comcast.net [68.39.173.20]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: d) by mainmail.teklibre.com (Postfix) with ESMTPSA id 20C493DA121; Mon, 18 Apr 2011 06:20:57 -0700 (PDT) Message-ID: <4DAC329E.3040603@taht.net> Date: Mon, 18 Apr 2011 06:46:22 -0600 From: Dave Taht User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8 MIME-Version: 1.0 To: bismark-devel@lists.bufferbloat.net, hndr@lists.pixilab.org, abhishekjain95@gmail.com References: <27196F99-CB9E-4963-8AA9-71A66B567B69@cc.gatech.edu> <4DAC2C03.2060502@taht.net> In-Reply-To: <4DAC2C03.2060502@taht.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Bismark-devel] Dates are really tricky!! X-BeenThere: bismark-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: BISMark related software development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Apr 2011 12:46:32 -0000 On 04/18/2011 06:18 AM, Dave Taht wrote: > On 04/17/2011 11:57 PM, Nick Feamster wrote: > >> I think we can whip up some quick visualizations of what's in the DB >> with a nice Web page with some concerted python hacking. For >> starters, it would be nice to have a version of the data whereby >> queries can actually complete in a reasonable time. :-) What's the >> best way to do this? Split the table by months? >> > I did look over one of the slower (30+ seconds!) queries and suggested > that a between statement be used instead of an and for the date range, > but we never got around to testing that and mysql used to not be able > to do anything sane with a between range (postgres can). > > e.g: > > mysql> select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, > sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from > FLOWS_newformat where tsstart>unix_timestamp('2011-2-6') and > tsstart dstport order by down_bytes desc limit 20; > > MIGHT do better with: > > select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, > sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from > FLOWS_newformat where tsstart between unix_timestamp('2011-2-6') and > unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport > order by down_bytes desc limit 20; > > > I'd not be huge on divvying up tables by month, but creating summary > tables by day and by hour would be a good idea. Usage patterns tend to > be self-similar across day of week and by hour of day. > > I note that it is important to be able to capture hour of day AT the > location (EDT/MDT/PDT), as that is more self-similar that hour of day > (UTC) > To be clearer, what I suggest above is not solved by what I wrote below, and the below contained an error. Coping with dates can be really tricky and should not be undertaken before my first cup of coffee. http://www.postgresql.org/docs/8.2/static/datatype-datetime.html > So the above becomes a nightly cron job in a specialized summary > table, something like (untested) > > drop table device_port_daily_summary; > create table device_port_daily_summary as > select deviceid, dstport, day(tsstart) as test_day, EXTRACT(YEAR FROM TIMESTAMP tsstart) as test_year , sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat group by deviceid, dstport, test_day, test_year; The way described below has a day-range of 0-365, which wasn't what you wanted. Days since the epoch or the start of database time would probably be better than explicitly storing the year... but I don't remember the good way of doing that. Dates are tricky. > select deviceid, dstport, day(tsstart) as test_day, sum(uppkts) as > up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as > down_bytes from FLOWS_newformat group by deviceid, dstport, test_day would be wrong.