English version /Japanese version
Updated: 2011/4/5

Research activities

Current researches: Former researches:

Teaching of holding-up motion by humanoid robot using force sensor information

This research proposes a manipulation acquisition framework based on off-line trials with failure/success. Two feature spaces are constructed using nonlinear mapping, feature space of force sensor information and feature space of configuration space. Thay are used to modify holding-up motion online. First a robot tries to hold up an object and verifies its force sensor information. The robot predicts whether tha task can be achieved using the sensor information. When it is predicted that the robot will fail to hold up, it modifies the configuration using the feature space information. The proposed framework was evaluated by simulations with a humanoid robot.

Idea of modification of holding-up configuration
using force sensor information

Evaluation of holding-up motion by simulation
with a humanoid robot


  1. Yuichi Kobayashi and Masanobu Tsubota, "Hold-up motion generation based on feature extraction of force sensor information", JSME Robotics and mechatronics conference, (to be presented), 2011.

Extraction of modes related to motions of body and objects and planning space shift motion generation

To improve the flexibility of robotic learning, it is important to realize an ability to generate a hierarchical structure. This paper proposes a learning framework which can dynamically change the planning space depending on the structure of tasks. Synchronous motion information is utilized to generate ’modes’ and hierarchical structure of the controller is constructed based on the modes. This enables efficient planning and control in low-dimensional planning space, though the dimension of the total state space is in general very high. Three types of object manipulation tasks are tested as applications, where an object is found and used as a tool (or as a part of the body) to extend the ability of the robot. The proposed framework is expected to be a basic learning model to account for body schema acquisition including tool affordances.

Idea of motion generation based on separation of path planning of object and body

Total scheme of motion generation with multiple mode transitions which consist of modes with body motion and with body and object motions
Pushing motion generation considering multiple mode transitions
Object manipulation with an L-shaped tool, obtained by the proposed motion generation framework


  1. Yuichi Kobayashi and Shigeyuki Hosoe, Planning-Space Shift Motion Generation: Variable-space Motion Planning: Toward Flexible Extension of Body Schema, Journal of Intelligent and Robotic Systems, volume 62, issue 3-4, 2011.
  2. Yuichi Kobayashi and Shigeyuki Hosoe, ``Planning-Space Shift Learning: Variable-space Motion Planning toward Flexible Extension of Body Schema,'' Proc. of IEEE/RSJ Int. Conf. on Intelligent Robot and Systems, 373-379, 2009.

Learning of object manipulation considering stick/slip contact mode change

This research proposes the learning of whole arm manipulation with a two-link manipulator. Our proposal combines a controller obtained by reinforcement learning (actor-critic) and a learning classifier realized by a Support Vector Machine (SVM). The classifier learns the boundary between slip and stick modes in torque space. Using the result of classification, the robot learns to move the object toward desired position while keeping the desired contact modes. Control input (torque) is first specified by the actor. The SVM classifier judges whether torque can maintain the desired slip or stick mode and, if not, it modifies the torque so that the desired mode is maintained. It was verified in the simulation that our proposed learning realized accelerating of the object and decelerating it while keeping the desired mode, i.e., avoiding undesired slipping of the object.

Examples of object manipulation with contact mode changes:
Holding up, pushing and rotating manipulations

Learning of object rotation task using SVM and model predictive control


  1. Nobuyuki Kawarai and Yuichi Kobayashi, Learning of whole arm manipulation with constraint of contact mode maintaining, Journal of Robotics and Mechatronics, Vol. 22, No.4, 542-550, 2010.
  2. Yuichi Kobayashi, Masashi Shibata, Shigeyuki Hosoe and Yoji Uno, ``Learning of Object Manipulation with Stick/Slip Mode Switching,'' IEEE/RSJ 2008 Int. Conf. on Intelligent Robot and Systems, 373-379, 2008.

Extraction of body/object information from images for robot motion generation

It is important for robots that act in human-centered environments to build image processing in a bottom-up manner. This paper proposes a method to autonomously acquire image feature extraction that is suitable for motion generation while moving in unknown environment. The proposed method extracts low level features without specifying image processing for robot body and obstacles. The position of body is acquired in image by clustering of SIFT features with motion information and state transition model is generated. Based on a learning model of adaptive addition of state transition model, collision relevant features are detected. Features that emerge when the robot can not move are acquired as collision relevant features. The proposed framework is evaluated with real images of the manipulator and an obstacle in obstacle avoidance.

Extraction of visual features relevant to collision and motion generation using the extracted features

Extraction of body and object based on random exploration and object manipulation using the extracted information


  1. Taichi Okamoto, Yuichi Kobayashi and Masaki Onishi, Acquisition of Body and Object Representation Based on Motion Learning and Planning Framework, Proc. of the 9th Int. Conf. on Intelligent Systems Design and Applications, 737-742, 2009.
  2. Takahiro Asamizu and Yuichi Kobayashi, Acquisition of image feature on collision for robot motion generation, Proc. of the 9th Int. Conf. on Intelligent Systems Design and Applications, 1312-1317, 2009.

Motion Generation by Integration of Multiple Observation Spaces for Robots with Limited Range of Observation

Sensors of robots that act in unstructured environment sometimes do not provide complete observation, due to occlusion or limitation of sensing range. This paper presents a motion generation method for robot with multiple sensors with limited sensing ranges. The proposed method introduce extension of the action-observation mapping to outside of the sensing range of a sensor, based on the diffusion-based learning of Jacobian matrices between control input and observation variable. Multiple observation spaces can be integrated by finding correspondence between the virtual observation spaces. When a target observation variable is given to the robot, it can generate a motion from an observation space toward the target with another observation space using the extended observation space. The proposed framework is verified by two robot tasks, reaching motion toward the floor with a manipulator and navigation of mobile robot around the wall. In both cases, observation space by camera with limited view was extended and appropriate motion trajectories were obtained.

Figure(left): Motion generation from sensor(observable) space (i) to sensor space (j)
Figure(mid): Application (1); Observation of end-effector by camera and measurement of distance to floor with proximity sensor
Figure(right): Application (2); Navigation of mobile robot toward a wall with camera and proximity sensor


  1. Eisuke Kurita, Yuichi Kobayashi, Manabu Gouko, Motion Generation by Integration of Multiple Observation Spaces for Robots with Limited Range of Observation, 2011 International Conference on Control, Robotics and Cybernetics, 2011 (accepted).

Design of Parallel Tasks of Human-interacting Robot using Optimization and Optimal Control

Robots that interact with humans in household environments are required to achieve multiple simultaneous tasks such as carrying objects, collision avoidance and conversation with human, in real time. This paper presents a design framework of multiple human-interacting tasks to meet the requirement by considering stochastic behavior of humans. The proposed designing method first introduces petri-net for parallel multiple tasks. The petri-net formulation is converted to Markov decision processes and processed in optimal control framework. Multiple task arbitration is resolved by optimization with approximated value functions. Two tasks of safety confirmation and conversation tasks are mutually interacting and expressed by petri-net. Tasks that normally tend to be designed by integrating many if-then rules can be dealt with in a systematic manner in the proposed framework, that is, in a state estimation and optimization framework. The proposed arbitration method was verified by simulations and experiments using RI-MAN, which was developed to do interactive tasks with humans.

Expression of parallel tasks by petri net Experiment with RI-MAN in human-interactive environment


  1. Yuichi Kobayashi, Masaki Onishi, Shigeyuki Hosoe, Zhiwei Luo, ``Behavior Design of A Human-interactive Robot through Parallel Tasks Optimization,'' Proc. of the 9th International Symposium on Distributed Autonomous Robotic Systems (DARS2008), 2008.

Autonomous decentralized control of capturing behavior by multiple mobile robots

This research discusses the design of decentralized capturing behavior by multiple mobile robots. The design is based on a gradient descent method with local information. The task of capturing a target is divided into two problems, enclosing behavior and grasping behavior. We give analysis on convergence of the local control policy in enclosing problem. In grasping behavior, we consider the force-closure condition in decentralized form for designing a local objective function. The proposed local control policies were evaluated in simulations, where the flexibility of the system was verified caused by the decentralized nature of the system.

Enclosing behavior of a circular moving object by 6 robots

Grasping behavior of an ellipsoidal object by 4 robots

Grasping behavior by 6 robots

Experiment with mobile robots


  1. Yuichi Kobayashi, Kyouji Otsubo and Shigeyuki Hosoe, ``Design of Decentralized Capturing Behavior by Multiple Mobile Robots,'' IEEE 2006 Workshop on Distributed Intelligent Systems, 13-18, 2006.
  2. 小林祐一,大坪恭士,細江繁幸,"群移動ロボットによる協調捕獲行動の自律分散制御", 第6回計測自動制御学会制御部門大会資料, Vol.2, pp.463-468, 2006.
  3. 小林祐一,大坪恭士,細江繁幸,野田幸男,"分散協調捕獲行動のための群移動ロボット制御", 第16回インテリジェント・システム・シンポジウム講演論文集, pp.171-176, 2006.

Optimizing Resolution for Feature Extraction in Robotic Motion Learning

This paper presents a feature extraction method for robotic motion learning that optimizes image resolution to the task, thereby minimizing computation time. It utilizes mean-shift algorithms and principal component analysis for feature extraction, reinforcement learning for motion learning, and trial and error for finding the appropriate resolution. When applied to a manipulator pushing an object, the resolution adjustment method reduces the task time from one minute to 21 seconds.

Proposed architecture for feature extraction from image inputs

Experimental setup with camera and manipulator for object pushing task

Flow of feature extraction using image information

Pocessed images at each time step with different resolutions


  1. Masato Kato, Yuichi Kobayashi and Shigeyuki Hosoe, ``Optimizing Resolution for Feature Extraction in Robotic Motion Learning ,'' IEEE Int. Conf. on Systems, Man & Cybernetics, Hawaii USA, 1086-1091, 2005.

Reinforcement learning for object manipulation using low-dimensional mapping

This paper proposes a reinforcement learning method for dynamic control problems with holonomic constraints. The learning method is applicable to problems where the actual motion of the system is restricted to lower-dimensional submanifolds, so long as certain conditions are satisfied. Such dynamic control problems occur in robotic manipulation, which usually includes some holonomic constraints between the object and the robot or the environment. By introducing nonlinear mapping to one-dimensional space and approximating the boundary of a discontinuous reward function, the proposed method results in effective learning. The method is evaluated in a one degree of freedom object rotating task with contact force considerations. The effectiveness of the proposed learning method was verified by comparison to ordinal Q-learning and Dyna without the proposed mapping method.

Constrained motion of manipulation by robot hand

Submanifold generated by constrained motion in configuration space

Object rotation task with cosnraint to keep contact between hand and object

An example of reward profile


  1. Yuichi Kobayashi, Hiroki Fujii and Shigeyuki Hosoe, ``Reinforcement learning for object manipulation using low-dimensional mapping,'' Transactions of the Society of Instrument and Control Engineers, Vol.42, No.7, 2006.
  2. Yuichi Kobayashi, Hiroki Fujii and Shigeyuki Hosoe, ``Reinforcement Learning for Manipulation Using Constraint between Object and Robot,'' IEEE Int. Conf. on Systems, Man & Cybernetics, Hawaii USA., 871-876, 2005.

Hyper-cubic function approximation for reinforcement learning based on autonomous-decentralized algorithm

Adaptive resolution of function approximator is known to be important when we apply reinforcement learning to unknown problems. We propose to apply successive division and integration scheme of function approximation to Temporal Difference learning based on local curvature. TD learning in continuous state space is based on non-constant value function approximation, which requires the simplicity of function approximator representation. We define bases and local complexity of function approximator in the similar way to the autonomous decentralized function approximation, but they are much simpler. The simplicity of approximator element bring us much less computation and easier analysis. The proposed function approximator is proved to be effective through function approximation problem and a reinforcement learning common problem, pendulum swing-up task and acrobot stabilizing task.

Comparison of learning performance among RBF network, fixed approximation and proposed adaptive resolution approximation

Performance of control obtained by adaptive resolution function approximation

An example of adaptive resolution in pendulum swing-up application


  1. Yuichi Kobayashi, Hideo Yuasa, Shigeyuki Hosoe, ``Hyper Cubic Function Approximation for Reinforcement Learning Based on Autonomous-Decentralized Algorithm,'' Transactions of the Society of Instrument and Control Engineers, Vol. 40, No. 8, 849-858, 2004 (in Japanese)
  2. Yuichi KOBAYASHI and Shigeyuki HOSOE, ``Adaptive Resolution Function Approximation for TD-learning: Simple Division and Integration,'' Proc. of SICE Annual Conference 2003, Fukui, Japan, 3023-3028, 2003.
  3. Yuichi Kobayashi and Shigeyuki HOSOE, ``Hyper-Cubic Discretization in Reinforcement Learning Based on Autonomous Decentralized Approach,'' IEEE Int. Conf. on Systems, Man & Cybernetics, Washington D.C. USA, 3633-3638, 2003.

Function approximation for reinforcement learning using autonomous-decentralized algorithm

The adaptability of resolution to the complexity of approximated function has a great influence on the performance of learning in the function approximation for reinforcement learning. We propose applying the reactiondiffusion equation on a graph to function approximation for reinforcement learning.The function approximator expressed by nodes can change its resolution adaptively by distributing them densely in the complex region of the state space with the proposed algorithm. A function is expressed in a plane. The successive least square method is adopted to approximate the function from the data.Each plane corresponds to a node, which is an element of the graph.Each node moves to diffuse the complexity of the approximated function in the neighborhood based on the reaction-diffusion equation.The complexity of the function is defined by the change of gradient. The simulation shows the two points: 1) The proposed algorithm provides the adaptability for function approximation. 2) The function approximation improves the efficiency of the reinforcement learning.
平面の貼り合わせ 境界付きグラフへの適用
Adaptive resolution in 1D function approximation problem
Adaptive resolution in 2D function approximation problem

Pendulum swing-up task

Comparison of learning performance between adaptive method(red) and fixed structure(blue)
  1. Yuichi Kobayashi, Hideo Yuasa, Tamio Arai, ``Function Approximation for Reinforcement Learning Using Autonomous-Decentralized Algorithm,'' Transactions of the Society of Instrument and Control Engineers, Vol. 38, No. 2, 219-226, 2002 (in Japanese)
  2. Yuichi KOBAYASHI, Hideo YUASA and Shigeyuki HOSOE, ``Q-learning with Adaptive Resolution Function Approximation based on Graph,'' Proc. of the ICASE/SICE Workshop: Intelligent Control and Systems, Muju, Korea, 79-84, 2002.
  3. Yuichi Kobayashi, Hideo Yuasa, and Tamio Arai, ``Function Approximation for Reinforcement Learning Based on Reaction-Diffusion Equation on a Graph,'' Proc. of SICE Annual Conference 2002, Osaka, Japan, 916-921, 2002.

Design of quadruped robot soccer behavior considering observational cost

In this paper, we present a real-time decision making method for a quadruped robot whose sensor and locomotion have large errors, considering the observational cost and the optimality. We make a State-Action Map by off-line planning considering the uncertainty of the robot's location with Dynamic Programming (DP). Using this map, the robot can immediately decide optimal action which minimizes the time to reach a target state at any states. The number of observation is also minimized. We compress this map for implementation with Vector Quantization (VQ). The total loss of optimality through compression is minimized by using the differences of the values between the optimal action and the others. In the simulation, the performance of some soccer behaviors were improved in comparison with current methods. The proposed method is implemented on the real robot and the low computation under the restriction of the memory was verified in the experiment.

Expression of state transition with uncertainty in state space including uncertainty parameter

An example of observation strategy with real quadruped robot


  1. Yuichi Kobayashi, Takeshi Fukase, Ryuichi Ueda, Hideo Yuasa, Tamio Arai, ``Design of Quadruped Robot Soccer Behavior Considering Observational Cost,'' Journal of the Robotics Society of Japan, Vol. 21, No.7, 802-810 (in Japanese), 2003
  2. Takeshi Fukase, Yuichi Kobayashi, Ryuichi Ueda, Takanobu Kawabe and Tamio Arai, ``Real-time Decision Making under Uncertainty of Self-Localization Results,’’ The 2002 International RoboCup Symposium Pre-Proceedings, 372-379, 2002.
  3. Takeshi FUKASE, Masahiro YOKOI, Yuichi KOBAYASHI, Hideo YUASA and Tamio ARAI, ``Quadruped Robot Navigation Considering the Observational Cost,'' Andreas Birk, Silvia Coradeschi and Satoshi Tadokoro (Eds.), RoboCup 2001: Robot Soccer World Cup V, Springer, 350-355, 2002.

Back to TOP PAGE