The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

umair (1) [Avatar] Offline
I am working on a power management problem where I control the power management of a computing board based on the occurance of events. I am using Reinforcement learning (the traditional Q-learning) for power management where the computing boards works as a Service Provider (SP) for processing requests (images). The SP is connected to a smart camera and the Power Manager (PM) algorithm runs on the camera where it issues appropriate power commands (sleep, wake-up) to the SP. The smart camera captures images (requests) based on the occurance of an event and maintains a Service Queue (SQ) for the requests (images). I also have an ANN based workload estimator that classifies the current workload as low or high. The state space for the Q-learning algorithm is therefore comprises a composite for Q(s,a) where s=(SR, SQ, SP). SR is the state of the workload. SQ is the state of the service queue and SP is the state of the service provider. Based on the current workload, state of the queue and the state of the service provider, the PM issues certain commands to the SP (sleep, wake-up). The decision is taken at the following stages:

1. SP is idle
2. SP just entered the sleep state and SQ>=1
3. SP is in the sleep state and SQ transits from 0 to 1.

For each action, a cost is assigned which consists of a weighted sum of average power consumption and average latency per request caused by the action. Both the average power consumption and average latency caused by an action are assigned relative weights as follows:

c(s,a)=lambda*p_avg + (1-lambda)*avg_latency

Where lambda is a power-performance parameter. In both sleep state and idle state, the action comprises selecting some time-out values from a list of pre-defined time-out values. My problem is as follows:

Using the above mentioned cost, it always favors small time-out values in sleep state, because the avg_latency for small time-out values is always less. Hence the cost function for small timeout values is always small. I expect that if I increase the power-performance parameter, lambda, the learning should go for higher power saving at the expense of higher latency. It should, then, select higher time-out values in sleep state. How can I modify the cost function?
peter.harrington (82) [Avatar] Offline
Re: Reinforcement learning for power management
Sorry for the late response.
Did you solve this? If so, how did you solve it?

I would modify the cost function with arbitrary constants: B

c(s,a)=B0*lambda*p_avg + B1*(1-lambda)*avg_latency