[Bismark-devel] Dates are really tricky!!

Mon Apr 18 08:46:22 EDT 2011

On 04/18/2011 06:18 AM, Dave Taht wrote:
> On 04/17/2011 11:57 PM, Nick Feamster wrote:
>
>> I think we can whip up some quick visualizations of what's in the DB 
>> with a nice Web page with some concerted python hacking.  For 
>> starters, it would be nice to have a version of the data whereby 
>> queries can actually complete in a reasonable time.  :-)  What's the 
>> best way to do this?  Split the table by months?
>>
> I did look over one of the slower (30+ seconds!) queries and suggested 
> that a between statement be used instead of an and for the date range, 
> but we never got around to testing that and mysql used to not be able 
> to do anything sane with a between range (postgres can).
>
> e.g:
>
> mysql>  select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, 
> sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from 
> FLOWS_newformat where tsstart>unix_timestamp('2011-2-6') and 
> tsstart<unix_timestamp('2011-2-20') and deviceid='NB105' group by 
> dstport order by down_bytes desc limit 20;
>
> MIGHT do better with:
>
> select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, 
> sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from 
> FLOWS_newformat where tsstart between unix_timestamp('2011-2-6') and 
> unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport 
> order by down_bytes desc limit 20;
>
>
> I'd not be huge on divvying up tables by month, but creating summary 
> tables by day and by hour would be a good idea. Usage patterns tend to 
> be self-similar across day of week and by hour of day.
>
> I note that it is important to be able to capture hour of day AT the 
> location (EDT/MDT/PDT), as that is more self-similar that hour of day 
> (UTC)
>
To be clearer, what I suggest above is not solved by what I wrote below, 
and the below contained an error.

Coping with dates can be really tricky and should not be undertaken 
before my first cup of coffee.

http://www.postgresql.org/docs/8.2/static/datatype-datetime.html

> So the above becomes a nightly cron job in a specialized summary 
> table, something like (untested)
>
> drop table device_port_daily_summary;
> create table device_port_daily_summary as
>
select deviceid, dstport, day(tsstart) as test_day, EXTRACT(YEAR FROM 
TIMESTAMP tsstart) as test_year , sum(uppkts) as up_p, sum(dwpkts) as 
down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from 
FLOWS_newformat group by deviceid, dstport, test_day, test_year;

The way described below has a day-range of 0-365, which wasn't what you 
wanted. Days since the epoch or the start of database time would 
probably be better than explicitly storing the year... but I don't 
remember the good way of doing that.

Dates are tricky.

> select deviceid, dstport, day(tsstart) as test_day, sum(uppkts) as 
> up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as 
> down_bytes from FLOWS_newformat group by deviceid, dstport, test_day

would be wrong.