[Bismark-devel] [Hndr] FCC app progress: django

Dave Taht d at taht.net
Mon Apr 18 08:18:11 EDT 2011


On 04/17/2011 11:57 PM, Nick Feamster wrote:
> (sorry for list cross-posting)
>
> Srikanth, Abhishek, Walter,
Taking walter and sri off the explicit cc list as both are already on 
bismark-devel.

> Small, slow progress on the FCC app---
>
> I have a written a very simple server with django.  I've got the django server hooked up to our bismark mysql database.  It's basically a slick python wrapper for doing Web development with a database back-end.   The setup is ugly right now, but it's a proof of concept, and it works (django is talking to our back-end).  For example:
> http://networkdashboard.org:8000/summary/
>
How we have configured the web servers here has had us survive several 
slashdottings.

http://www.bufferbloat.net/projects/bloat/wiki/Dogfood_Principle

Using fast cgi vs wsgi would probably lead to some security benefits and 
more predictable behavior, however it's a tossup.

http://docs.djangoproject.com/en/dev/howto/deployment/fastcgi/

> Django has templates that will allow us to make this slick and pretty, but it's getting a little late for that tonight.
>
> I think we can whip up some quick visualizations of what's in the DB with a nice Web page with some concerted python hacking.  For starters, it would be nice to have a version of the data whereby queries can actually complete in a reasonable time.  :-)  What's the best way to do this?  Split the table by months?
>

Your mysql configuration was an out of the box default one designed for 
very small databases. Yours is quite large. Increasing several buffer 
sizes by several orders of magnitude would help a lot.

Sri had made significant inroads into porting the database into postgres 
when I was there. Postgres is not only more highly performant and 
flexible than mysql, with a more capable parser, but has good analysis 
tools. It was my hope to benchmark some of your slowest existing queries 
using postgres's ANALYZE <query> tool to see how the design could be 
improved overall.

I did look over one of the slower (30+ seconds!) queries and suggested 
that a between statement be used instead of an and for the date range, 
but we never got around to testing that and mysql used to not be able to 
do anything sane with a between range (postgres can).

e.g:

mysql>  select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where tsstart>unix_timestamp('2011-2-6') and tsstart<unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport order by down_bytes desc limit 20;

MIGHT do better with:

select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where tsstart between unix_timestamp('2011-2-6') and unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport order by down_bytes desc limit 20;


I'd not be huge on divvying up tables by month, but creating summary 
tables by day and by hour would be a good idea. Usage patterns tend to 
be self-similar across day of week and by hour of day.

I note that it is important to be able to capture hour of day AT the 
location (EDT/MDT/PDT), as that is more self-similar that hour of day (UTC)

So the above becomes a nightly cron job in a specialized summary table, 
something like (untested)

drop table device_port_daily_summary;
create table device_port_daily_summary as

select deviceid, dstport, day(tsstart) as test_day, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat group by deviceid, dstport, test_day


(although it would be mildly better to do this incrementally, creating a 
primary key on the first 3 fields, and perhaps reversing the order of 
primary fields above. Depends on the statistics)

> I will check what I have into the bismark source tree so that others can work on this.  (I'm looking at Abhishek :-)
>
While not exactly allergic to svn, I am a git fan-boy. Even when working 
with svn these days I use git-svn to cope with it (offline git logs 
rule). I'm not asking you to change your scm backend, just sayin...





More information about the Bismark-devel mailing list