Communication and coordination within the server

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Communication and coordination within the server

grampajohn
Administrator

I've been trying to nail down what I think are two of the major architectural issues that stand between now and a working server.

Communication -- How do the elements of the server communicate with each other. At one time we assumed that the mechanism would be JMS messages, but that requires explicit representation of communication paths. While that might be desirable in a distributed system, it seems like overkill when the elements of the server already share a database.

There are four kinds of communication needed, it seems to me:

  1. Broker-to-Server: Brokers are remote. Communication will be by asynchronous messages, carried by ApacheMQ and exposed in the Java/Groovy environment as JMS. To enable non-Java broker implementations, we will use xml payloads. When possible, these payloads will be bundled to reduce traffic. For example, the AccountingService could bundle the complete set of TariffTransactions for a timeslot in a single message.
  2. Intra-Server: We assume that the server runs as a single, possibly multi-threaded process, with a database. That means that all communication among server elements can be done either through direct method calls or through shared data in the database. Direct method calls must use interface dependencies, and not class dependencies. This requires a bit of tweaking of the normal Grails auto-wiring scheme, but seems possible. To use the database effectively for communication, we need to establish a few basic principles, see below.
  3. Server-to-Visualizer: Since the Visualizer is a remote app, communication will be through asynchronous messages. Payloads could be xml or json.
  4. Webapp-to-Server: In a production environment (as opposed to research and development environments), the web-app front end will be separated from the server. In either case, communication will be through a disk-based database, which could be MySQL or presumably hsqldb. This is not the same database as the game database, because it holds agent registrations, game records for an arbitrary number of past games, and game configurations for future games.
For incoming (from server standpoint) broker-to-server and visualizer-to-server communication, an important question is what server elements are responsible for processing these messages? There are two possibilities:
  1. Incoming messages are routed to various server elements by proper configuration of ApacheMQ and the server. Outgoing messages are sent to Brokers from various server elements as needed.
  2. Incoming messages are received by (and outgoing messages sent by) server-side proxies for the individual brokers, or perhaps by a single proxy that serves all brokers. Similarly, a visualizer proxy would communicate with the Visualizer. Proxy implementations would, of course, have access to the server database.
I like option (b) above because it does not create unnecessary dependencies between module behavior and a configuration file, and because it does not overload any existing modules with message-distribution tasks.

What remains, then, is to decide exactly how to use the database for communication. While I do not yet have a complete answer, a partial approach is to use the current Timeslot as a holder for all results related to the timeslot, such as market and tariff transactions. I have implemented that for TariffTransactions, and it seems to work well.

Coordination -- How and when do the various server elements get activated? In a strictly message-based system this is typically done by incoming messages, but that requires careful thought to the threading model, and generally requires that control of the threading model be delegated to the messaging infrastructure. It also gives no effective control over event sequencing, which is essential in a simulator such as ours. The server is not primarily a collection of event-driven activities; it is a simulation of a coordinated set of real-world activities. For example, it makes no sense to try to compute overall balance (and run the balancing market) until all the customer models have computed their production and consumption for the current timeslot, and there is no event that signals that the last customer has finished.

We have a simulation clock that can run simulated time forward at a fixed rate, using algorithm that will allow remote communicating entities (brokers, visualizer, server) to agree on the current simulation time without additional communication and associated latencies. We could potentially have each server element simply post activation triggers on the simulation clock, but this ignores sequence dependencies. For example, what if some customer model started posting TariffTransaction instances before the correct Timeslot is marked as the "current" timeslot?

What I propose is that the CompetitionController take responsibility for sequencing events within each timeslot, perhaps by calling an activateTimeslot() method that takes the current timeslot as an argument. The CC need not depend on all the other server elements; it can depend on a simple interface that has this one method, and a simple preconfigured list can determine the sequence. With slightly more effort, such a list could even specify which activities can be run in parallel (such as customer models). Another advantage is that this approach would make it easy to detect and deal with cases where the server fails to complete its tasks within the timeslot window.

I solicit feedback on this set of proposals. I would like to finalize and post to the wiki by this weekend at the latest.

John

Reply | Threaded
Open this post in threaded view
|

Re: Communication and coordination within the server

Prashant Reddy
Hi John and all,

This is a great analysis and thanks for the opportunity to comment on it (sorry for the delayed response).  A few thoughts - 

1) Minor terminology thing... I was confused there for a bit about the difference between the "Visualizer" and the "Webapp".  If I understand them correctly, maybe we should be more explicit and call them "Game Visualizer" and "Server Console" or "Competition Manager" or something like that?  

2) The Visualizer connecting directly to the Server concerns me.  Does each Visualizer instance create its own message traffic to the Server or do they all go through a single visualization-server/proxy, so that the Server is unaffected by dozens of Visualizers constantly refreshing?  I'd prefer the latter approach.  Adis, et al, I have not been involved with the Visualizer design, so please ignore this if this has already been discussed.

3) We may need a fifth kind of communication (you say five, but only list four).  Some messages (especially historical time series, weather data, and possibly some bundled messages) may get very big.  In this case we may have to resort to binary encodings within the XML messages or a chunked XML stream.  We probably wouldn't need to address this unless we encounter messages in the tens of megabytes and we certainly shouldn't prematurely optimize for it, but I wanted to throw it out there as a possibility.  Of course, if we have to resort to this, we'll have to think about whether some of these binary encodings also end up in the database or if the database is always exploded into typed elements.

4) For incoming messages from the Brokers to the Server, I see one or more Broker proxies being beneficial only if the internal communication within the Server is entirely through the database.  Otherwise, the receiving proxy will still have to be wired to deliver the messages to the appropriate Server elements, no?  A third possibility for handling incoming messages is as follows.  Assuming that there are a small number of Server elements that need to receive messages, lets call each of these a Service.  Each Service then reads messages from its own ApacheMQ queue and the Brokers put messages explicitly on the queue to the Service with which they mean to communicate.  So, there wouldn't be a generic Server queue.  This has the advantage that if in the future, various Services needed to be separated into different physical processes, the Brokers would not change.  The counter argument is that if a Server refactoring eliminated or changed the roles of some Services, Brokers would need to adapt to those changes--they can't be handled internally by rewiring how the Server handles messages.

5) Regarding coordination, most of my experience implementing multi-threaded online systems has been with an event loop--select() or poll() on sets of file descriptors--that can handle both I/O and timer events.  We can create dummy file descriptors for the timer events, including the simulation clock, using UNIX pipes for instance.  That way, control of the threading model doesn't necessarily need to be delegated to the messaging infrastructure; the Competition Controller can also drive the threads using pipes across threads or using timers within threads.  This type of system is very responsive and flexible but also somewhat brittle in that if the handler for a particular I/O event takes too long, it can delay timer events in that thread.  All that said, and thinking about it, I am leaning towards agreeing with you that if the Competition Controller can drive the whole simulation as a synchronous loop over each timeslot, that would make things much simpler.  However, I'm not sure how we'd effectively leverage multiple threads in your model.  

Thanks,
Prashant


Reply | Threaded
Open this post in threaded view
|

Re: Communication and coordination within the server

adis
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Communication and coordination within the server

grampajohn
Administrator
adis wrote
About the Visualizer, it was never intended to be connected to the server
directly.

The way I always imagined it would be implemented, is sort of a
publisher-subscriber module, where each instance of the Visualizer would
simply subscribe to a channel where the server would be pushing relevant
messages which subscribers would receive. As far as I know, there was never
an option of every Visualizer requesting data from the server, because it
could create too much payload. So, in a nutshell, the latter option you
described is the one preferred.
That is also how I envision that it will work. But the server must somehow communicate the information needed by some number of visualizers, and that information must be routed to the visualizers. An important open question is whether there is any communication that must go the other direction. Can you think of any?
Reply | Threaded
Open this post in threaded view
|

Re: Communication and coordination within the server

grampajohn
Administrator
In reply to this post by grampajohn

I am still a little concerned about the interactions with Customer:

  • Customer has two activities that can result in accounting transactions. One is switching tariff subscriptions, and the other is consuming/producing power. Do these have to happen at the same time?
  • We said that tariff subscriptions could be evaluated periodically, probably less than once/timeslot. It could be 1-4 times per day.
  • When I was working on the tariff subscription and meter-reading processes (they are implemented on the tariff side, and it's the tariff that generates the transactions), I assumed that the tariff would attach the transactions to the current timeslot. In the Spring Integration approach, they would be messages on a queue.
  • The advantage of the Spring Integration aproach is that work can be delegated at any time, without worrying about the details. The disadvantage is that it's somewhat complicated to set up.
  • The advantage of attaching transactions to the current timeslot is that they are "rooted" in the correct time, and can be picked up later and processed. But it requires that the "current" timeslot be _exactly_ correlated with the simulation clock (you have to access the current timeslot with a query that specifies the start time), and it overloads the responsibility of the Timeslot abstraction.
  • The current scheme unnecessarily overloads the responsibilities for financial accounting and balancing, which really should be handled by two different entities.
So I'm trying to think through the portions of the simulation scenario that must be sequenced correctly in each timeslot. It seems to me that accounting is not one of those portions. So here's a first stab at the per-timeslot sequence:
  1. Customers gather data (or have already gathered it) that might influence their consumption. This includes at least current rates for variable-rate tariffs, the amount of power they have already used for the day on a tiered-rate tariff, the current weather conditions, and any balancing actions taken in the previous timeslot on controllable sources/loads. It might also include future rates on TOU or variable-rate tariffs, and weather forecasts.
  2. Customers run their production/consumption models and report per tariff, because tariff terms can affect these decisions. This is implemented by calling the usePower() method on TariffSubscription. A side effect of this call is that TariffTransaction instances get created (and right now attached to the current timeslot) for each transaction that is triggered. For a 2-part tariff, two separate transactions are created - one for the fixed part, and one for the variable part. However, we are not currently generating separate transactions for cases where multiple Rates apply (for example, usage that crosses a usage-tier boundary), and in general there's no indication in the transaction of which Rates were applied. Brokers might like to know this.
  3. Some entity (the DU, perhaps) must gather up all the usage reports from Customers, combine it with the Broker market positions, and run the balancing process. This is described in the [start-of-timeslot scenario](https://github.com/powertac/powertac-server/wiki/Accounting-start-of-timeslot) on the server wiki, but it's not an intrinsic responsibility of the Accounting Service. The result of the balancing process can result in balancing actions that must be communicated to Customers and Brokers no later than the beginning of the next timeslot, as well as financial transactions that must be communicated to Accounting and to Brokers.
It's beginning to sound to me like a situation where we have a core, synchronous simulation process, surrounded by peripheral, possibly asynchronous processes like Accounting and information distribution (sending information to Brokers and Visualizers). That makes me think of a structure where we have a main simulation loop that either (a) calls methods on peripheral components, like Accounting, or (b) posts results using some sort of queuing mechanism to be picked up and processed later by peripheral services, and that's one of the things Spring Integration does. Of course, there are other, possibly simpler ways to do queuing. One is with the database, and another is with simple in-memory queues, but the latter has the possible disadvantage of being strictly one receiver, multiple sender method. The database method is also a one-receiver model unless you leave all the messages in the queue and somehow have each consuming process independently keep track of which messages it has consumed. But we have potential receivers that are not part of the simulation process - the Visualizer and Brokers.

So perhaps I'm just slow, but I think there's maybe a role for JMS in the server, because it can decouple processes that need not be coupled. However, the original design failed to recognize the need for a coordinated simulation core, and I think it severely overloaded the responsibilities of Accounting. So what's the way forward? Here's a suggestion:

  • Communication need not be confused with activation. If communication is done with method calls, then activation and communication are folded together. If it's done with JMS, then some other process controls activation.
  • We need a clear responsibility statement for each module (plugins and major Domain types). This is going to require some responsibility-assignment effort - there are some good references under "responsibility-driven design" and "software responsibility assignment".
  • One of the "overloaded" responsibilities of Accounting is to serve as the Tariff market. I suggest we add a TariffMarket plugin that's responsible for receiving TariffSpecifications and HourlyCharge updates, validating them, storing them in the database, notifying competing Brokers (and the Visualizer), and generating the necessary transactions for publication fees. This service does not necessarily need to run in every timeslot. In fact, we can manage server load somewhat by running this process one (simulated) hour ahead of the process that drives customer tariff evaluation.
  • Make the CompetitionController responsible for activation of the various plugins. An alternative is to have each non-core module run its own thread, and use wait/notify synchronization. That could work for Accounting, but not for Customer tariff evaluation.
This has turned out to be much longer than I intended, but I feel some clarity dawning, finally. Please let me know if it inspires a bit of clarity for you also. John
Reply | Threaded
Open this post in threaded view
|

Re: Communication and coordination within the server

grampajohn
Administrator
I have added an "activation" section to the high-level descriptions of the plugins to help us think about how our plugins are supposed to work. Please let me know if you think this all makes sense (or especially if you disagree).

John
Reply | Threaded
Open this post in threaded view
|

Re: Communication and coordination within the server

Prashant Reddy
Hi John,

Nearly all of it makes sense to me -- the only question I have is this:

The tariff-market doesn't have an API to serve up the current tariffs?  So, customers that want to evaluate tariffs have to get them from the database directly?  How about brokers -- shouldn't they be able to request current and historical tariffs from the server (or have the tariffs published to them as they get stored in the database)?  It seems to me that the tariff-market API should be expanded to allow retrieval of all tariffs instead of just the default tariff(s).

Thanks,
Prashant

Reply | Threaded
Open this post in threaded view
|

Re: Communication and coordination within the server

grampajohn
Administrator
The tariff-market plugin is implemented, tested, and committed. The only thing that's not yet working is output to brokers on tariff publication, variable-rate changes, and tariff revocations. The customer API allows retrieval of relevant tariffs, and requires action on the part of customers to switch subscriptions away from revoked tariffs. This is because customers have to know about revoked tariffs anyway. Details are in the tariff publication story.

Transactions are posted to accounting by the simple expedient of sticking them in the database. The postedTime field is the current sim time, so all Accounting has to do is periodically retrieve all the transactions with newer postedTime values than when it last ran. Presumably it will be triggered once/timeslot.

Please let me know if this seems right to you.

John
Reply | Threaded
Open this post in threaded view
|

Re: Communication and coordination within the server

grampajohn
Administrator
In reply to this post by grampajohn
I think it's time to jettison Spring Integration unless someone other than me can give us all a quick, clear tutorial, with examples we can use, on how to actually use it in Grails, especially in an integration test environment. So far, I cannot find any useful information on this, and I have had no useful responses from those I have asked.

So how do the pieces of the server communicate?

- The accounting-service can get all its transactions from the database, simply by including a postedTime field in the transactions. The accounting service can then wake up once/timeslot, do a simple query, and process the list.

- Communication with the broker can easily go through a broker-proxy, with just two methods in its API: sendToBroker(broker, message), and broadcast(message). Incoming messages from the brokers would show up in the proxy, and could be forwarded from there either by storing in the database or calling API methods.

- Communication with the visualizer can go through another proxy. Ideally, the visualizer would operate by making data requests, either to plugin APIs or directly to the database. That way the rest of the server need not be aware of the visualizer at all.

I have started to post sequence diagrams for the stories on our development roadmap. If there are any scenarios that you think cannot be handled without internal messaging, please let me know and we'll try to work it out.

Cheers -

John