OpenCV2.0提供了行人檢測的例子,用的是法國人Navneet Dalal最早在CVPR2005會議上提出的方法。
最近正在學習它,下面是自己的學習體會,希望共同探討提高。
1、VC 2008 Express下安裝OpenCV2.0--可以直接使用2.1,不用使用CMake進行編譯了,避免編譯出錯
????? 這是一切工作的基礎,感謝版主提供的參考:
http://www.opencv.org.cn/index.php/VC_2008_Express??????�OpenCV2.0
2、體會該程序
在DOS界面,進入如下路徑: C:\OpenCV2.0\samples\c peopledetect.exe filename.jpg
其中filename.jpg為待檢測的文件名
3、編譯程序
創建一個控制臺程序,從C:\OpenCV2.0\samples\c下將peopledetect.cpp加入到工程中;按步驟1的方法進行設置。編譯成功,但是在DEBUG模式下生成的EXE文件運行出錯,很奇怪 。
改成RELEASE模式后再次編譯,生成的EXE文件可以運行。
4程序代碼簡要說明
1) getDefaultPeopleDetector() 獲得3780維檢測算子(105 blocks with 4 histograms each and 9 bins per histogram there are 3,780 values)--(為什么是105blocks?)
2).cv::HOGDescriptor hog; 創建類的對象 一系列變量初始化
winSize(64,128), blockSize(16,16), blockStride(8,8),
cellSize(8,8), nbins(9), derivAperture(1), winSigma(-1),
histogramNormType(L2Hys), L2HysThreshold(0.2), gammaCorrection(true)
3). 調用函數:detectMultiScale(img, found, 0, cv::Size(8,8), cv::Size(24,16), 1.05, 2);
參數分別為待檢圖像、返回結果列表、門檻值hitThreshold、窗口步長winStride、圖像padding margin、比例系數、門檻值groupThreshold;通過修改參數發現,就所用的某圖片,參數0改為0.01就檢測不到,改為0.001可以;1.05改為1.1就不行,1.06可以;2改為1可以,0.8以下不行,(24,16)改成(0,0)也可以,(32,32)也行
該函數內容如下
(1) 得到層數 levels
某圖片(530,402)為例,lg(402/128)/lg1.05=23.4 則得到層數為24
(2) 循環levels次,每次執行內容如下
HOGThreadData& tdata = threadData[getThreadNum()];
Mat smallerImg(sz, img.type(), tdata.smallerImgBuf.data);
調用以下核心函數
detect(smallerImg, tdata.locations, hitThreshold, winStride, padding);
其參數分別為,該比例下圖像、返回結果列表、門檻值、步長、margin
該函數內容如下:
(a)得到補齊圖像尺寸paddedImgSize
(b)創建類的對象 HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride); 在創建過程中,首先初始化 HOGCache::init,包括:計算梯度 descriptor->computeGradient、得到塊的個數105、每塊參數個數36
(c)獲得窗口個數nwindows,以第一層為例,其窗口數為(530+32*2-64)/8+1、(402+32*2-128)/8+1 =67*43=2881,其中(32,32)為winStride參數,也可用(24,16)
(d)在每個窗口執行循環,內容如下
在105個塊中執行循環,每個塊內容為:通過getblock函數計算HOG特征并歸一化,36個數分別與算子中對應數進行相應運算;判斷105個塊的總和 s >= hitThreshold 則認為檢測到目標
4)主體部分感覺就是以上這些,但很多細節還需要進一步弄清。
5、原文獻寫的算法流程
文獻NavneetDalalThesis.pdf 78頁圖5.5描述了The complete object detection algorithm.
前2步為初始化,上面基本提到了。后面2步如下
For each scale Si = [Ss, SsSr, . . . , Sn]
(a) Rescale the input image using bilinear interpolation
(b) Extract features (Fig. 4.12) and densely scan the scaled image with stride Ns for object/non-object detections
(c) Push all detections with t(wi) > c to a list
Non-maximum suppression
(a) Represent each detection in 3-D position and scale space yi
(b) Using (5.9), compute the uncertainty matrices Hi for each point
(c) Compute the mean shift vector (5.7) iteratively for each point in the list until it converges to a mode
(d) The list of all of the modes gives the final fused detections
(e) For each mode compute the bounding box from the final centre point and scale
以下內容節選自文獻NavneetDalalThesis.pdf,把重要的部分挑出來了。其中保留了原文章節號,便于查找。
4. Histogram of Oriented Gradients Based Encoding of Images
Default Detector.
As a yardstick for the purpose of comparison, throughout this section we compare results to our
default detector which has the following properties: input image in RGB colour space (without
any gamma correction); image gradient computed by applying [?1, 0, 1] filter along x- and yaxis
with no smoothing; linear gradient voting into 9 orientation bins in 0_–180_; 16×16 pixel
blocks containing 2×2 cells of 8×8 pixel; Gaussian block windowing with _ = 8 pixel; L2-Hys
(Lowe-style clipped L2 norm) block normalisation; blocks spaced with a stride of 8 pixels (hence
4-fold coverage of each cell); 64×128 detection window; and linear SVM classifier. We often
quote the performance at 10?4 false positives per window (FPPW) – the maximum false positive
rate that we consider to be useful for a real detector given that 103–104 windows are tested for
each image.
4.3.2 Gradient Computation
The simple [?1, 0, 1] masks give the best performance.
4.3.3 Spatial / Orientation Binning
Each pixel contributes a weighted vote for orientation based on the orientation of the gradient element centred on it.
The votes are accumulated into orientation bins over local spatial regions that we call cells.
To reduce aliasing, votes are interpolated trilinearly between the neighbouring bin centres in both orientation and position.
Details of the trilinear interpolation voting procedure are presented in Appendix D.
The vote is a function of the gradient magnitude at the pixel, either the magnitude itself, its square, its
square root, or a clipped form of the magnitude representing soft presence/absence of an edge at the pixel. In practice, using the magnitude itself gives the best results.
4.3.4 Block Normalisation Schemes and Descriptor Overlap
good normalisation is critical and including overlap significantly improves the performance.
Figure 4.4(d) shows that L2-Hys, L2-norm and L1-sqrt all perform equally well for the person detector.
such as cars and motorbikes, L1-sqrt gives the best results.
4.3.5 Descriptor Blocks
R-HOG.
For human detection, 3×3 cell blocks of 6×6 pixel cells perform best with 10.4% miss-rate
at 10?4 FPPW. Our standard 2×2 cell blocks of 8×8 cells are a close second.
We find 2×2 and 3×3 cell blocks work best.
4.3.6 Detector Window and Context
Our 64×128 detection window includes about 16 pixels of margin around the person on all four
sides.
4.3.7 Classifier
By default we use a soft (C=0.01) linear SVM trained with SVMLight [Joachims 1999].We modified
SVMLight to reduce memory usage for problems with large dense descriptor vectors.
---------------------------------
5. Multi-Scale Object Localisation
the detector scans the image with a detection window at all positions and scales, running the classifier in each window and fusing multiple overlapping detections to yield the final object detections.
We represent detections using kernel density estimation (KDE) in 3-D position and scale space. KDE is a data-driven process where continuous densities are evaluated by applying a smoothing kernel to observed data points. The bandwidth of the smoothing kernel defines the local neighbourhood. The detection scores are incorporated by weighting the observed detection points by their score values while computing the density estimate. Thus KDE naturally incorporates the first two criteria. The overlap criterion follows from the fact that detections at very different scales or positions are far off in 3-D position and scale space, and are thus not smoothed together. The modes (maxima) of the density estimate correspond to the positions and scales of final detections.
Let xi = [xi, yi] and s0i denote the detection position and scale, respectively, for the i-th detection.
the detections are represented in 3-D space as y = [x, y, s], where s = log(s’).
the variable bandwidth mean shift vector is defined as (5.7)
For each of the n point the mean shift based iterative procedure is guaranteed to converge to a mode2.
Detection Uncertainty Matrix Hi.
One key input to the above mode detection algorithm is the amount of uncertainty Hi to be associated with each point. We assume isosymmetric covariances, i.e. the Hi’s are diagonal matrices.
Let diag [H] represent the 3 diagonal elements of H. We use scale dependent covariance
matrices such that diag
[Hi] = [(exp(si)_x)2, (exp(si)_y)2, (_s)2] (5.9)
where _x, _y and _s are user supplied smoothing values.
The term t(wi) provides the weight for each detection. For linear SVMs we usually use threshold = 0.
the smoothing parameters _x, _y,and _s used in the non-maximum suppression stage. These parameters can have a significant impact on performance so proper evaluation is necessary. For all of the results here, unless otherwise noted, a scale ratio of 1.05, a stride of 8 pixels, and _x = 8, _y = 16, _s = log(1.3) are used as default values.
A scale ratio of 1.01 gives the best performance, but significantly slows the overall process.
Scale smoothing of log(1.3)–log(1.6) gives good performance for most object classes.
We group these mode candidates using a proximity measure. The final location is the ode corresponding to the highest density.
----------------------------------------------------
附錄 A. INRIA Static Person Data Set
The (centred and normalised) positive windows are supplied by the user, and the initial set of negatives is created once and for all by randomly sampling negative images.A preliminary classifier is thus trained using these. Second, the preliminary detector is used to exhaustively scan the negative training images for hard examples (false positives). The classifier is then re-trained using this augmented training set (user supplied positives, initial negatives and hard examples) to produce the final detector.
INRIA Static Person Data Set
As images of people are highly variable, to learn an effective classifier, the positive training examples need to be properly normalized and centered to minimize the variance among them. For this we manually annotated all upright people in the original images.
The image regions belonging to the annotations were cropped and rescaled to 64×128 pixel image windows. On average the subjects height is 96 pixels in these normalised windows to allow for an approximately16 pixel margin on each side. In practise we leave a further 16 pixel margin around each side of the image window to ensure that flow and gradients can be computed without boundary effects. The margins were added by appropriately expanding the annotations on each side before cropping the image regions.
//<------------------------以上摘自datal的博士畢業論文
關于INRIA Person Dataset的更多介紹,見以下鏈接
http://pascal.inrialpes.fr/data/human/
Original Images
??????????? Folders 'Train' and 'Test' correspond, respectively, to original training and test images. Both folders have three sub folders: (a) 'pos' (positive training or test images), (b) 'neg' (negative training or test images), and (c) 'annotations' (annotation files for positive images in Pascal Challenge format).
Normalized Images
??????? Folders 'train_64x128_H96' and 'test_64x128_H96' correspond to normalized dataset as used in above referenced paper. Both folders have two sub folders: (a) 'pos' (normalized positive training or test images centered on the person with their left-right reflections), (b) 'neg' (containing original negative training or test images). Note images in folder 'train/pos' are of 96x160 pixels (a margin of 16 pixels around each side), and images in folder 'test/pos' are of 70x134 pixels (a margin of 3 pixels around each side). This has been done to avoid boundary conditions (thus to avoid any particular bias in the classifier). In both folders, use the centered 64x128 pixels window for original detection task.
Negative windows
??????? To generate negative training windows from normalized images, a fixed set of 12180 windows (10 windows per negative image) are sampled randomly from 1218 negative training photos providing the initial negative training set. For each detector and parameter combination, a preliminary detector is trained and all negative training images are searched exhaustively (over a scale-space pyramid) for false positives (`hard examples'). All examples with score greater than zero are considered hard examples. The method is then re-trained using this augmented set (initial 12180 + hard examples) to produce the final detector. The set of hard examples is subsampled if necessary, so that the descriptors of the final training set fit into 1.7 GB of RAM for SVM training.
//------------------------------------------------------______________>
原作者對 OpenCV2.0 peopledetect 進行了2次更新
https://code.ros.org/trac/opencv/changeset/2314/trunk
最近一次改為如下:
---------------------
#include "cvaux.h"
#include "highgui.h"
#include <stdio.h>
#include <string.h>
#include <ctype.h>
using namespace cv;
using namespace std;
int main(int argc, char** argv)
{
Mat img;
FILE* f = 0;
char _filename[1024];
if( argc == 1 )
{
printf("Usage: peopledetect (<image_filename> | <image_list>.txt)\n");
return 0;
}
img = imread(argv[1]);
if( img.data )
{
strcpy(_filename, argv[1]);
}
else
{
f = fopen(argv[1], "rt");
if(!f)
{
fprintf( stderr, "ERROR: the specified file could not be loaded\n");
return -1;
}
}
HOGDescriptor hog;
hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());
for(;;)
{
char* filename = _filename;
if(f)
{
if(!fgets(filename, (int)sizeof(_filename)-2, f))
break;
//while(*filename && isspace(*filename))
// ++filename;
if(filename[0] == '#')
continue;
int l = strlen(filename);
while(l > 0 && isspace(filename[l-1]))
--l;
filename[l] = '\0';
img = imread(filename);
}
printf("%s:\n", filename);
if(!img.data)
continue;
fflush(stdout);
vector<Rect> found, found_filtered;
double t = (double)getTickCount();
// run the detector with default parameters. to get a higher hit-rate
// (and more false alarms, respectively), decrease the hitThreshold and
// groupThreshold (set groupThreshold to 0 to turn off the grouping completely).
int can = img.channels();
hog.detectMultiScale(img, found, 0, Size(8,8), Size(32,32), 1.05, 2);
t = (double)getTickCount() - t;
printf("tdetection time = %gms\n", t*1000./cv::getTickFrequency());
size_t i, j;
for( i = 0; i < found.size(); i++ )
{
Rect r = found[i];
for( j = 0; j < found.size(); j++ )
if( j != i && (r & found[j]) == r)
break;
if( j == found.size() )
found_filtered.push_back(r);
}
for( i = 0; i < found_filtered.size(); i++ )
{
Rect r = found_filtered[i];
// the HOG detector returns slightly larger rectangles than the real objects.
// so we slightly shrink the rectangles to get a nicer output.
r.x += cvRound(r.width*0.1);
r.width = cvRound(r.width*0.1);
r.y += cvRound(r.height*0.07);
r.height = cvRound(r.height*0.1);
rectangle(img, r.tl(), r.br(), cv::Scalar(0,255,0), 3);
}
imshow("people detector", img);
int c = waitKey(0) & 255;
if( c == 'q' || c == 'Q' || !f)
break;
}
if(f)
fclose(f);
return 0;
}
更新后可以批量檢測圖片!
將需要批量檢測的圖片,構造一個TXT文本,文件名為filename.txt, 其內容如下
1.jpg
2.jpg
......
然后在DOS界面輸入 peopledetect filename.txt , 即可自動檢測每個圖片。
//////////////////////////////////////////////////////////////////------------------------------ Navneet Dalal的OLT工作流程描述
Navneet Dalal在以下網站提供了INRIA Object Detection and Localization Toolkit
http://pascal.inrialpes.fr/soft/olt/
Wilson Suryajaya Leoputra提供了它的windows版本
http://www.computing.edu.au/~12482661/hog.html
需要 Copy all the dll's (boost_1.34.1*.dll, blitz_0.9.dll, opencv*.dll) into "<ROOT_PROJECT_DIR>/debug/"
Navneet Dalal提供了linux下的可執行程序,借別人的linux系統,運行一下,先把總體流程了解了。
下面結合OLTbinaries\readme和OLTbinaries\HOG\record兩個文件把其流程描述一下。
1.下載 INRIA person detection database 解壓到OLTbinaries\;把其中的'train_64x128_H96' 重命名為 'train' ; 'test_64x128_H96' 重命名為 'test'.
2.在linux下運行 'runall.sh' script.
等待結果出來后,打開matlab 運行 plotdet.m 可繪制 DET曲線;
------這是一步到位法--------------------------------------------------
-------此外,它還提供了分步執行法-------------------------------------
1、由pos.lst列表提供的圖片,計算正樣本R-HOG特征,pos.lst列表格式如下
train/pos/crop_000010a.png
train/pos/crop_000010b.png
train/pos/crop_000011a.png
------以下表示-linux下執行語句(下同)------
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -s 1 train/pos.lst HOG/train_pos.RHOG
2.計算負樣本R-HOG特征
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -s 10 train/neg.lst HOG/train_neg.RHOG
3.訓練
./bin//dump4svmlearn -p HOG/train_pos.RHOG -n HOG/train_neg.RHOG HOG/train_BiSVMLight.blt -v
4.創建 model file: HOG/model_4BiSVMLight.alt
./bin//svm_learn -j 3 -B 1 -z c -v 1 -t 0 HOG/train_BiSVMLight.blt HOG/model_4BiSVMLight.alt
5.創建文件夾
mkdir -p HOG/hard
6.分類
./bin//classify_rhog train/neg.lst HOG/hard/list.txt HOG/model_4BiSVMLight.alt -d HOG/hard/hard_neg.txt -c HOG/hard/hist.txt -m 0 -t 0 --no_nonmax 1 --avsize 0 --margin 0 --scaleratio 1.2 -l N -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 --
epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys
--------
false +/- 分類結果會寫入 HOG/hard/hard_neg.txt
7. 將hard加入到neg,再次計算RHOG特征
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -s 0 HOG/hard/hard_neg.txt OG/train_hard_neg.RHOG --poscases 2416 --negcases 12180 --dumphard 1 --hardscore 0 -- memorylimit 1700
8.再次訓練
./bin//dump4svmlearn -p HOG/train_pos.RHOG -n HOG/train_neg.RHOG -n HOG/train_hard_neg.RHOG HOG/train_BiSVMLight.blt -v 4
9.得到最終的模型
./bin//svm_learn -j 3 -B 1 -z c -v 1 -t 0 HOG/train_BiSVMLight.blt HOG/model_4BiSVMLight.alt
Opencv中用到的3780 個值,應該就在這個模型里面model_4BiSVMLight.alt,不過它的格式未知,無法直接讀取,但是可以研究svm_learn程序是如何生成它的;此外,該模型由程序classify_rhog調用,研究它如何調用,估計是一個解析此格式的思路
10.創建文件夾
mkdir -p HOG/WindowTest_Negative
11.負樣本檢測結果
./bin//classify_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 --epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -p 1 --no_nonmax 1 --nopyramid 0 - -scaleratio 1.2 -t 0 -m 0 --avsize 0 --margin 0 test/neg.lst HOG/WindowTest_Negative/list.txt HOG/model_4BiSVMLight.alt -c HOG/WindowTest_Negative/histogram.txt
12.創建文件夾
mkdir -p HOG/WindowTest_Positive
13.正樣本檢測結果
./bin//classify_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -p 1 --no_nonmax 1 --nopyramid 1 -t 0 -m 0 --avsize 0 --margin 0 test/pos.lst HOG/WindowTest_Positive/list.txt HOG/model_4BiSVMLight.alt -c HOG/WindowTest_Positive/histogram.txt
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
分析了原作者的數據集,結合網上一些資料,下面描述如何制作訓練樣本
1、如何從原始圖片生成樣本
對比INRIAPerson\INRIAPerson\Train\pos(原始圖片),INRIAPerson\train_64x128_H96\pos(生成樣本)可以發現,作者從原始圖片裁剪出一些站立的人,要求該人不被遮擋,然后對剪裁的圖片left-right reflect。以第一張圖片為例crop001001,它剪裁了2個不被遮擋的人,再加上原照片,共3張,再加左右鏡像,總共6張。
2、裁剪
可利用基于opencv1.0的程序imageclipper,進行裁剪并保存,它會自動生成文件名并保存在同一路徑下新生成的imageclipper文件夾下。
3.改變圖片大小
可以利用Acdsee軟件,Tools/open in editor,進去后到Resize選項; tools/rotate還可實現left-right reflect
自己編了一個程序,批量改變圖片大小,代碼見下一樓
4. 制作pos.lst列表
進入dos界面,定位到需要制作列表的圖片文件夾下,輸入 dir /b> pos.lst,即可生成文件列表;
/////////////////////////
#include "cv.h"
#include "highgui.h"
#include "cvaux.h"
int main(int argc,char * argv[])
{
IplImage* src ;
IplImage* dst = 0;
CvSize dst_size;
FILE* f = 0;
char _filename[1024];
int l;
f = fopen(argv[1], "rt");
if(!f)
{
fprintf( stderr, "ERROR: the specified file could not be loaded\n");
return -1;
}
for(;;)
{
char* filename = _filename;
if(f)
{
if(!fgets(filename, (int)sizeof(_filename)-2, f))
break;
if(filename[0] == '#')
continue;
l = strlen(filename);
while(l > 0 && isspace(filename[l-1]))
--l;
filename[l] = '\0';
src=cvLoadImage(filename,1);
}
dst_size.width = 96;
dst_size.height = 160;
dst=cvCreateImage(dst_size,src->depth,src->nChannels);
cvResize(src,dst,CV_INTER_LINEAR);//////////////////
char* filename2 = _filename;char* filename3 = _filename; filename3="_96x160.jpg";
strncat(filename2, filename,l-4);
strcat(filename2, filename3);
cvSaveImage(filename2, dst);
}
if(f)
fclose(f);
cvWaitKey(-1);
cvReleaseImage( &src );
cvReleaseImage( &dst );
return 0;
}
?
更多文章、技術交流、商務合作、聯系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
