Tuesday, March 08, 2005

 

Acknowledgements

The information contained here has been drawn from many sources which are generally the work of some combination of these people:

Hirokazu Kato
Mark Billinghurst
Rob Blanding
Richard May
Hiroshi Ishii
Ronald Azuma
Ivan Poupyrev
Philip Lamb
and others...
 

Reliability

The most likely problem that you will encounter is to do with lighting conditions. They can be affected by light from windows and reflections from overhead lights. One way to compensate is be adjusting the threshold value in the program (in AR applications, pressing the 't' key will often allow the user to adjust the threshold value). Another way is make tracking more reliable is to cut the black part of the marker out of thin self-adhesive felt (can be sold for lining drawers, etc). Black felt is totally non-reflective and will ensure the marker has good contrast in all situations.


 

paddleDemo.c

The ARtoolkit download includes paddleDemo, which is a great introduction to Tangible AR interaction techniques. The techniques used for this demo are tilt, to drop an object, and pickup and are implemented in the file command_sub.c by the functions check_incline and check_pickup respectively.


 

Interactions

Example: Paddle Interaction
Proximity based
When two markers are within a certain distance of each other there is an interaction between the virtual objects. The paddle may "pick up" a virtual object or "push" it.

Tilting
At a certain angle, a virtual object can be made to "slide off" the paddle.

Shaking
A shaking motion can be used to delete an object.

Hitting
Objects can be removed by hitting them with the paddle

Local vs. Global Interactions
Local
Actions are determined by a single camera to marker transform - shaking, appearance, relative position, range.

Global
Actions determined from two relationships - marker to camera, and camera to world coordinates. The marker transform is determined in world coordinates to identify tilit, absolute position, obsolute rotation, hitting.

So all interactions are found based on the camera to marker transforms that are returned by the function arGetTransMat, which returns a 3 x 4 matrix that is the usual 4 x 4 3D transformation without the bottom row.

/* get the camera transformation */
arGetTransMat(&marker_info[k], marker_center, marker_width, marker_trans);

So the range from the camera to the marker is:

/* find the range */
Xpos = marker_trans[0][3];
Ypos = marker_trans[1][3];
Zpos = marker_trans[2][3];
range = sqrt(Xpos*Xpos+Ypos*Ypos+Zpos*Zpos);


Monday, March 07, 2005

 

How it works

ARToolKit uses computer vision techniques to calculate the real camera viewpoint relative to a real world marker. There are several steps as shown in images below. First the live video image (left) is turned into a binary (black or white) image based on a lighting threshold value (center). This image is then searched for square regions. ARToolKit finds all the squares in the binary image, many of which are not the tracking markers. For each square, the pattern inside the square is captured and matched again some pre-trained pattern templates. If there is a match, then ARToolKit has found one of the AR tracking markers. ARToolKit then uses the known square size and pattern orientation to calculate the position of the real video camera relative to the physical marker. A 3x4 matrix is filled in with the video camera real world coordinates relative to the card. This matrix is then used to set the position of the virtual camera coordinates. Since the virtual and real camera coordinates are the same, the computer graphics that are drawn precisely overlay the real marker (right). The OpenGL API is used for setting the virtual camera coordinates and drawing the virtual images.



 

Directory Structure

Files
The executable produced when simpleTest.c is compiled is simpleTestd.exe. In the same directory is a folder called Data that contains three files required by the program.
camera_para.dat
In artoolkit, camera parameter consists of perspective projectionmatrix and distortion parameters. The perspective projectionmatrix does not contain translation/rotation components. It consistsof field of view (f), aspect ratio(a), skew factor(s) and image center(x,y):



f s x 0
0 af y 0
0 0 1 0

Also usually s = 0. It depends on camera calibration method.This represents relationship between camera coordinates (X,Y,Z) andscreen coordinates (x, y).Unit of (X,Y,Z) is usually [mm]. This also depends on camera calibration.Unit of (x,y) is [pixel].

hiroPatt
A previously recorded image of the "Hiro" pattern is included for matching with markers found in the video. The Data folder will also contain files for other markers to be identified.


object_data
This file says which virtual objects the patterns correspond to.



 

Example

simpleTest.c
In this example a virtual cube is aligned on a real mark as shown in this picture.



Read through this file and notice how it relates to the six steps of the application structure given in the previous post.


#ifdef _WIN32
#include <windows.h>
#endif
#include <stdio.h>
#include <stdlib.h>
#ifndef __APPLE__
#include <GL/gl.h>
#include <GL/glut.h>
#else
#include <OpenGL/gl.h>
#include <GLUT/glut.h>
#endif
#include <AR/gsub.h>
#include <AR/video.h>
#include <AR/param.h>
#include <AR/ar.h>
/*****************************************************************************/
// modified by Thomas Pintaric, Vienna University of Technology

#ifdef _WIN32
char *vconf = "flipV,showDlg"; // see video.h for a list of supported parameters
#else
char *vconf = "";
#endif
/*****************************************************************************/
int   xsize, ysize;
int thresh = 100;
int count = 0;
char    *cparam_name = "Data/camera_para.dat";
ARParam cparam;
char    *patt_name = "Data/patt.hiro";
int patt_id;
double patt_width = 80.0;
double patt_center[2] = {0.0, 0.0};
double patt_trans[3][4];
static void init(void);
static void cleanup(void);
static void keyEvent( unsigned char key, int x, int y);
static void mainLoop(void);
static void draw( void );
int main(int argc, char **argv)
{
init();

arVideoCapStart();
argMainLoop( NULL, keyEvent, mainLoop );

return (0);
}
static void keyEvent( unsigned char key, int x, int y)
{

/* quit if the ESC key is pressed */
if( key == 0x1b ) {
printf("*** %f (frame/sec)\n", (
double)count/arUtilTimer());
cleanup();
exit(0);
}
}
/* main loop */
static void mainLoop(void)
{
ARUint8 *dataPtr;
ARMarkerInfo *marker_info;

int marker_num;
int j, k;
   /* grab a video frame */
if( (dataPtr = (ARUint8 *)arVideoGetImage()) == NULL ) {
arUtilSleep(2);

return;
}

if( count == 0 ) arUtilTimerReset();
count++;
   argDrawMode2D();
argDispImage( dataPtr, 0,0 );
   /* detect the markers in the video frame */
if( arDetectMarker(dataPtr, thresh, &marker_info, &marker_num) < 0)
cleanup();
exit(0);
}
   /* check for object visibility */
k = -1;
for( j = 0; j < marker_num; j++) {
if( patt_id == marker_info[j].id ) {
if( k == -1 ) k = j;
else if( marker_info[k].cf < marker_info[j].cf) k = j;
}
}
   if( k == -1 ) {
argSwapBuffers();

return;
}
   /* get the transformation between the marker and the real camera */
arGetTransMat(&marker_info[k], patt_center, patt_width, patt_trans);
   draw();
   argSwapBuffers();
}
/* initialize the video path, read in the marker and camera parameters,
and setup the graphics window */

static void
init( void )
{
ARParam wparam;
   /* open the video path */
if( arVideoOpen( vconf ) < 0) exit(0);
/* find the size of the window */
if( arVideoInqSize(&xsize, &ysize) < 0) exit(0);
printf("Image size (x,y) = (%d,%d)\n", xsize, ysize);
   /* read and set the initial camera parameters */
if
( arParamLoad(cparam_name, 1, &wparam) < 0) {
printf("Camera parameter load error !!\n");
exit(0);

}
arParamChangeSize( &wparam, xsize, ysize, &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;cparam );
arInitCparam( &cparam );
printf("*** Camera Parameter ***\n");
arParamDisp( &cparam );
   /* read the marker (pattern) file */
if( (patt_id=arLoadPatt(patt_name)) < 0 ) {
printf("pattern load error !!\n");
exit(0);
}
   /* open the graphics window */
argInit( &cparam, 1.0, 0, 0, 0, 0 );
}
/* cleanup function called when program exits */
static void cleanup(void)
{
arVideoCapStop();
arVideoClose();
argCleanup();
}


static void draw( void )
{

double gl_para[16];
GLfloat mat_ambient[] = {0.0, 0.0, 1.0, 1.0};
GLfloat mat_flash[] = {0.0, 0.0, 1.0, 1.0};
GLfloat mat_flash_shiny[] = {50.0};
GLfloat light_position[] = {100.0,-200.0,200.0,0.0};
GLfloat ambi[] = {0.1, 0.1, 0.1, 0.1};
GLfloat lightZeroColor[] = {0.9, 0.9, 0.9, 0.1};
   argDrawMode3D();
argDraw3dCamera( 0, 0 );
glClearDepth( 1.0 );
glClear(GL_DEPTH_BUFFER_BIT);
glEnable(GL_DEPTH_TEST);
glDepthFunc(GL_LEQUAL);
   /* load the camera transformation matrix */
argConvGlpara(patt_trans, gl_para);
glMatrixMode(GL_MODELVIEW);
glLoadMatrixd( gl_para );
   glEnable(GL_LIGHTING);
glEnable(GL_LIGHT0);
glLightfv(GL_LIGHT0, GL_POSITION, light_position);
glLightfv(GL_LIGHT0, GL_AMBIENT, ambi);
glLightfv(GL_LIGHT0, GL_DIFFUSE, lightZeroColor);
glMaterialfv(GL_FRONT, GL_SPECULAR, mat_flash);
glMaterialfv(GL_FRONT, GL_SHININESS, mat_flash_shiny);
glMaterialfv(GL_FRONT, GL_AMBIENT, mat_ambient);
glMatrixMode(GL_MODELVIEW);
glTranslatef( 0.0, 0.0, 25.0 );
glutSolidCube(50.0);
glDisable( GL_LIGHTING );

glDisable( GL_DEPTH_TEST );
}


 

Basic Structure

ARToolkit Structure



There are three key libraries:

  1. libAR - tracking
  2. libARgsub - image drawing
  3. libARvideo - video capturing

Application Structure

An ARToolkit application involves the following steps:

1. Initialize – start video, read marker patterns & camera parameters.

2. Grab video frame.
3. Detect markers & recognize patterns.
4. Calculate camera transformation relative to the detected patterns.
5. Draw virtual objects on detected patterns.

6. Close video.

Steps 2 through 5 are repeated continuously until the application quits, while steps 1 and 6 are done on initialization and shutdown respectively. In addition to these steps the application may need to respond to mouse, keyboard or other application specific events.


 

Installation

ARToolkit 2.70 with VRML support

The latest version is available from http://sourceforge.net/projects/artoolkit
and includes the following files:

ARToolkit-2.70.tgz Platform independent
DsVideoLib-0.0.4-win32.zip i386
OpenVRML-0.14.3-win32.zip i386

The ARToolKit is a collection of libraries, utilities applications, and documentation and sample code. The libraries provide the user with a means to capture images from video sources, process those images to optically track markers in the images, and to allow compositing of computer-generated content with the real-world images and display the result using OpenGL (Phillip Lamb, 2004). ARToolKit is designed to build on Windows, Linux, SGI Irix, andMacintosh OS X platforms.

Building on Windows
(Read the full release notes on Sourceforge for other platforms)


Prerequisites:

Build steps:
  1. Unpack the ARToolKit zip to a convenient location. This location will be referred to below as {ARToolKit}.
  2. Unpack the DSVideoLib zip into {ARToolKit}.
  3. Copy the files DSVideoLib.dll and DSVideoLibd.dll from{ARToolKit}\DSVideoLib\bin.vc70 into {ARToolKit}\bin.
  4. Run the script {ARToolKit}\DSVideoLib\bin.vc70\register_filter.bat.
  5. Install the GLUT DLL into the Windows System32 folder, and the library and headers into the VS platform SDK folders.
  6. Run the script {ARToolKit}\Configure.win32.bat to create include/AR/config.h.
  7. Open the ARToolKit.sln file (VS.NET) or ARToolkit.dsw file (VS6).
  8. Open the Visual Studio search paths settings(Tools->Options->Directories for VS6, or Tools->Options->Projects->VC++Directories for VS.NET) and add the DirectX SDK Includes\ path and theDirectX Samples\C++\DirectShow\BaseClasses\ path to the top of the search path for headers, and the DirectX SDK Lib\ path to the top of thesearch path for libraries.
  9. (Optional, only if rebuilding DSVideoLib). Build the DirectShow baseclasses strmbase.lib and strmbasd.lib. (More information can be found atThomas Pintarics homepage for DSVideoLib(http://www.ims.tuwien.ac.at/~thomas/dsvideolib.php)).
  10. Build the toolkit.The VRML renderering library and example (libARvrml & simpleVRML) are optional builds:
  11. Unpack the OpenVRML zip into {ARToolKit}.
  12. Copy js32.dll from {ARToolKit}\OpenVRML\bin into {ARToolKit}\bin.
  13. Enable the libARvrml and simpleVRML projects in the VS configuration manager and build.

Monday, February 21, 2005

 

Introduction

Definitions

Vision: The process of converting sensory information into knowledge of the objects in the environment The eye is just a sensor, that extracts only a part of the total information available, and the brain is the primary organ of vision.

Computer Vision - 1: Mapping from pictures to an abstract description (Siggraph).

Computer Vision - 2: A branch of artificial intelligence and image processing concerned with computer processing of images from the real world. Computer vision typically requires a combination of low level image processing to enhance the image quality (e.g. remove noise, increase contrast) and higher level pattern recognition and image understanding to recognise features present in the image (Hyper Dictionary).

Augmented Reality (AR) - 1: That class of displays that consists primarily of a real environment, with graphic enhancements or augmentations (Drascic and Milgram).

Augmented Reality (AR) - 2: The use of transparent HMDs to overlay computer generated images onto the physical environment. Precisely calibrated, rapid head tracking is required to sustain the illusion (HITLab Washington).

Tangible AR Interfaces: Those in which each virtual object is registered to a physical object and the user interacts with virtual objects by manipulating the corresponding tangible objects. In the Tangible AR approach the physical objects and interactions are equally as important as the virtual imagery and provide a very intuitive way to interact with the AR interface (Billinghurst, Kato and Poupyrev).

Success with these interfaces often depends on how well the co-ordinate system of the virtual world is registered, or aligned, with that of the real world. Maintaining registration in a dynamic, changing environment is a very challenging technological issue. In this investigation tangible devices are tracked using computer vision. Optical tracking is also used to register virtual content with real world images.

Proposal

Examples of TUI devices will be implemented using optical tracking so we are interested in image processing algorithms that pertain to this. The intention is to look at two TUI examples; Image processing in general and ARToolkit in particular. The complexity of the method can depend on how many degrees of freedom are required by the task and the range of interactions supported.

(1) Image Processing: An overview of techniques will be presented along with a simple TUI example.

(2) ARToolkit: A tutorial on how to use the toolkit will be presented along with a TUI example.

This page is powered by Blogger. Isn't yours?