From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-13-ewr.dyndns.com (mxout-052-ewr.mailhop.org [216.146.33.52]) by lists.bufferbloat.net (Postfix) with ESMTP id A7FC42E07DF for ; Mon, 18 Apr 2011 06:22:33 -0700 (PDT) Received: from scan-12-ewr.mailhop.org (scan-12-ewr.local [10.0.141.230]) by mail-13-ewr.dyndns.com (Postfix) with ESMTP id 06856A5053F for ; Mon, 18 Apr 2011 13:22:33 +0000 (UTC) X-Spam-Score: 1.8 (+) X-Mail-Handler: MailHop by DynDNS X-Originating-IP: 209.85.161.171 Received: from mail-gx0-f171.google.com (mail-gx0-f171.google.com [209.85.161.171]) by mail-13-ewr.dyndns.com (Postfix) with ESMTP id 69B24A4CF60 for ; Mon, 18 Apr 2011 13:22:28 +0000 (UTC) Received: by gxk22 with SMTP id 22so2118348gxk.16 for ; Mon, 18 Apr 2011 06:22:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:references:in-reply-to:mime-version :content-type:message-id:content-transfer-encoding:cc:x-mailer:from :subject:date:to; bh=QdNnYDNjPakyKggU/RvkmGz8M5412F+JTxxDs3moKq8=; b=CVvPrTcqDoxIGvwzswWj9FOJZYY6Q6eA/qGRnz5oLCaqjcunOg5K5j30pTqBOdSLSq QGRVt6MTgedoxlSD5tce+WvBefDqKuT4C0tFic2hc9CEGd0eauAJ3pbrhk+9KUfvDfZT 8C/UHZDz5ifxtLdWGALct5fQ8OQBY4NywYpjw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=references:in-reply-to:mime-version:content-type:message-id :content-transfer-encoding:cc:x-mailer:from:subject:date:to; b=B75F/Wc/h04tZfCpci6adECYC4A/ToaZKglUcpubXnVD2hDOthzfQgq/a/1twlq3yP M0C2oueJNhig9IeS7Xisptm5lxzQGoq93yLG7/8oVGCQHYj9g9zrpjh4O7cIwIQodN/4 RtXUv99Uw2wnJCl/pEHDhO5UAE6PWrJsCoGQE= Received: by 10.101.143.1 with SMTP id v1mr2620957ann.21.1303132946111; Mon, 18 Apr 2011 06:22:26 -0700 (PDT) Received: from [172.16.0.140] (adsl-145-176-102.asm.bellsouth.net [72.145.176.102]) by mx.google.com with ESMTPS id x34sm5458550ana.36.2011.04.18.06.22.23 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 18 Apr 2011 06:22:24 -0700 (PDT) References: <27196F99-CB9E-4963-8AA9-71A66B567B69@cc.gatech.edu> <4DAC2C03.2060502@taht.net> <4DAC329E.3040603@taht.net> In-Reply-To: <4DAC329E.3040603@taht.net> Mime-Version: 1.0 (iPhone Mail 8G4) Content-Type: text/plain; charset=us-ascii Message-Id: <276F1A97-392F-43AC-935E-22DD3A9C6D10@gmail.com> Content-Transfer-Encoding: quoted-printable X-Mailer: iPhone Mail (8G4) From: Marshini Chetty Date: Mon, 18 Apr 2011 09:22:19 -0400 To: Dave Taht X-Mailman-Approved-At: Mon, 18 Apr 2011 08:09:19 -0700 Cc: "hndr@lists.pixilab.org" , "bismark-devel@lists.bufferbloat.net" , "abhishekjain95@gmail.com" Subject: Re: [Bismark-devel] [Hndr] Dates are really tricky!! X-BeenThere: bismark-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: BISMark related software development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Apr 2011 13:22:34 -0000 Hi all This is awesome. At this rate, getting something together for the FCC compet= ition seems very doable.=20 M.=20 Sent from my iPhone On Apr 18, 2011, at 8:46 AM, Dave Taht wrote: > On 04/18/2011 06:18 AM, Dave Taht wrote: >> On 04/17/2011 11:57 PM, Nick Feamster wrote: >>=20 >>> I think we can whip up some quick visualizations of what's in the DB wit= h a nice Web page with some concerted python hacking. For starters, it woul= d be nice to have a version of the data whereby queries can actually complet= e in a reasonable time. :-) What's the best way to do this? Split the tab= le by months? >>>=20 >> I did look over one of the slower (30+ seconds!) queries and suggested th= at a between statement be used instead of an and for the date range, but we n= ever got around to testing that and mysql used to not be able to do anything= sane with a between range (postgres can). >>=20 >> e.g: >>=20 >> mysql> select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(u= pbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where t= sstart>unix_timestamp('2011-2-6') and tsstart>=20 >> MIGHT do better with: >>=20 >> select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) a= s up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where tsstart be= tween unix_timestamp('2011-2-6') and unix_timestamp('2011-2-20') and devicei= d=3D'NB105' group by dstport order by down_bytes desc limit 20; >>=20 >>=20 >> I'd not be huge on divvying up tables by month, but creating summary tabl= es by day and by hour would be a good idea. Usage patterns tend to be self-s= imilar across day of week and by hour of day. >>=20 >> I note that it is important to be able to capture hour of day AT the loca= tion (EDT/MDT/PDT), as that is more self-similar that hour of day (UTC) >>=20 > To be clearer, what I suggest above is not solved by what I wrote below, a= nd the below contained an error. >=20 > Coping with dates can be really tricky and should not be undertaken before= my first cup of coffee. >=20 > http://www.postgresql.org/docs/8.2/static/datatype-datetime.html >=20 >> So the above becomes a nightly cron job in a specialized summary table, s= omething like (untested) >>=20 >> drop table device_port_daily_summary; >> create table device_port_daily_summary as >>=20 > select deviceid, dstport, day(tsstart) as test_day, EXTRACT(YEAR FROM TIME= STAMP tsstart) as test_year , sum(uppkts) as up_p, sum(dwpkts) as down_p, su= m(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat grou= p by deviceid, dstport, test_day, test_year; >=20 > The way described below has a day-range of 0-365, which wasn't what you wa= nted. Days since the epoch or the start of database time would probably be b= etter than explicitly storing the year... but I don't remember the good way o= f doing that. >=20 > Dates are tricky. >=20 >> select deviceid, dstport, day(tsstart) as test_day, sum(uppkts) as up_p, s= um(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes f= rom FLOWS_newformat group by deviceid, dstport, test_day >=20 > would be wrong. >=20 > _______________________________________________ > HNDR mailing list > HNDR@lists.pixilab.org > http://lists.pixilab.org/listinfo.cgi/hndr-pixilab.org