Inclusion of competition ID in domain types

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Inclusion of competition ID in domain types

grampajohn
Administrator
I've noticed that all the domain types written by Carsten and the others at KIT include the competition ID as a field. Is this necessary, or even a good idea? I assume the reasoning is that you can just stuff the data from multiple competitions into a single database and do queries.

Let's think about this a bit. I'm not convinced we should do this, because
(1) it's a fair amount of extra data to lard into the server when it's running a simulation;
(2) there's no way we are going to write to a non-empty database when running a simulation; instead, we'll start with an empty database, and take a dump when a simulation completes;
(3) anyone wanting to do cross-game analysis will likely want to do some data transformations during the process of loading data from a sim into an analysis database. The competition ID can be added at that point easily enough.

I would like to reach consensus on this before the Valentine's Day release (14 Feb). Please let me know your thoughts. I've created issue #92 to deal with the changes that will be needed once we decide which way to go.

John
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Inclusion of competition ID in domain types

Carsten Block
Administrator
Hi!

Thanks for starting this discussion.

> (2) there's no way we are going to write to a non-empty database when running a simulation; instead, we'll start with an empty database, and take a dump when a simulation completes;
That's the question... I really don't know. We discussed a scenario where e.g. four different competitions (one in each season) are counted as one single "powertac match", i.e. the overall winner of the four competitions wins the "match" and reaches the next round in the tournament. This is one possibility to avoid complex jumps in simulation time (which I think would be quite painful to implement). To easily determine match winners in such a scenario we should have outcomes of all matches in the database (i.e. no data dumping and purging between competition runs).

Another use case I find quite likely is the one of a "local simulation series" where the competition server is not connected to our central web-app (i.e. db dumps are not pushed to our central web-app). I think this is a likely scenario too where powertac participants run their own local simulation studies (e.g. to evaluate their agents). Also in this case it likely more convenient to run the whole simulation series in a row (without dumping and purging of data in between) and to do the data mining on the local server database directly without the need to transform data.

> (3) anyone wanting to do cross-game analysis will likely want to do some data transformations during the process of loading data from a sim into an analysis database. The competition ID can be added at that point easily enough.

In the old server we started with no competition references in the domain classes and added them only later on, which was a bit painful. In the new server you can fill in competition reference using the static method "Competition.currentCompetition()" (which is efficently cached in ehcache).

Overall, we can still run in dump & purge mode. But keeping competition references in the tables comes at modest overhead (both computationally and in terms of programming effort) and gives us more flexibility in potential server usage in future, which would be more limited otherwise.

Cheers,
Carsten  

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Inclusion of competition ID in domain types

grampajohn
Administrator
On 01/31/2011 05:13 AM, Carsten Block [via Power TAC Developers] wrote:

> Hi!
>
> Thanks for starting this discussion.
>
>  > (2) there's no way we are going to write to a non-empty database when
> running a simulation; instead, we'll start with an empty database, and
> take a dump when a simulation completes;
> That's the question... I really don't know. We discussed a scenario
> where e.g. four different competitions (one in each season) are counted
> as one single "powertac match", i.e. the overall winner of the four
> competitions wins the "match" and reaches the next round in the
> tournament. This is one possibility to avoid complex jumps in simulation
> time (which I think would be quite painful to implement). To easily
> determine match winners in such a scenario we should have outcomes of
> all matches in the database (i.e. no data dumping and purging between
> competition runs).

First of all, I'm assuming an in-memory database, not shared among
processes. Otherwise we have to deal with locking issues, and
performance will be very problematic. That makes the empty-start and
end-dump processes necessary and trivial.

>
> Another use case I find quite likely is the one of a "local simulation
> series" where the competition server is not connected to our central
> web-app (i.e. db dumps are not pushed to our central web-app). I think
> this is a likely scenario too where powertac participants run their own
> local simulation studies (e.g. to evaluate their agents). Also in this
> case it likely more convenient to run the whole simulation series in a
> row (without dumping and purging of data in between) and to do the data
> mining on the local server database directly without the need to
> transform data.

db dumps would not be pushed to a central server in almost all uses of
the simulator. Remember that most of the time, it will be used by local
research groups; the cross-institution competitions will be the most
visible activities, but that's really a special case. Also, I have no
expectation that we would provide any sort of centralized access to data
other than individual simulation db dumps. Folks who want to do
cross-game studies of competition data will need to gather up all the db
dumps and process them locally.

Aggregating data across simulations for the purpose of winner
determination and posting game summaries does not require aggregation of
the full simulation records, only summary data. Competition servers need
to share a database with the web-app, but that's a traditional (MySQL)
database that would not be used during a simulation. It's purpose is to
hold agent registrations, competition configurations and summary data,
and (references to) db dumps from completed simulations.

>
>  > (3) anyone wanting to do cross-game analysis will likely want to do
> some data transformations during the process of loading data from a sim
> into an analysis database. The competition ID can be added at that point
> easily enough.
>
> In the old server we started with no competition references in the
> domain classes and added them only later on, which was a bit painful. In
> the new server you can fill in competition reference using the static
> method "Competition.currentCompetition()" (which is efficently cached in
> ehcache).
>
> Overall, we can still run in dump & purge mode. But keeping competition
> references in the tables comes at modest overhead (both computationally
> and in terms of programming effort) and gives us more flexibility in
> potential server usage in future, which would be more limited otherwise.

There's no need for dump-and-purge if we start the server, load a
config, run a simulation, post summary data, dump the sim db, and quit.
This will be easy to do in cases where the web-app and sim server are
separate processes. I'm not quite sure of the mechanics if we combine
them for the research version, but it should not be difficult.

This discussion tells me that we need to spend a bit of time on
architecture again - we're running off the end of what we've worked out.
I'll give that some attention once I have gone through the subscription
type.

Cheers -

John
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Inclusion of competition ID in domain types

grampajohn
Administrator
In reply to this post by Carsten Block
One more question: If it's so easy to include the competition ID, why is the variable not being initialized in any of the types? It cannot be null, and yet there's no code I've found yet that sets it to non-null. What am I missing?
Loading...