Tesseract
3.02
Main Page
Related Pages
Modules
Namespaces
Classes
Files
File List
File Members
All
Classes
Namespaces
Files
Functions
Variables
Typedefs
Enumerations
Enumerator
Friends
Macros
Groups
Pages
cjkpitch.h
Go to the documentation of this file.
1
2
// File: cjkpitch.h
3
// Description: Code to determine fixed pitchness and the pitch if fixed,
4
// for CJK text.
5
// Copyright 2011 Google Inc. All Rights Reserved.
6
// Author: takenaka@google.com (Hiroshi Takenaka)
7
// Created: Mon Jun 27 12:48:35 JST 2011
8
//
9
// Licensed under the Apache License, Version 2.0 (the "License");
10
// you may not use this file except in compliance with the License.
11
// You may obtain a copy of the License at
12
// http://www.apache.org/licenses/LICENSE-2.0
13
// Unless required by applicable law or agreed to in writing, software
14
// distributed under the License is distributed on an "AS IS" BASIS,
15
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16
// See the License for the specific language governing permissions and
17
// limitations under the License.
18
//
20
#ifndef CJKPITCH_H_
21
#define CJKPITCH_H_
22
23
#include "
blobbox.h
"
24
#include "
notdll.h
"
25
26
// Function to test "fixed-pitchness" of the input text and estimating
27
// character pitch parameters for it, based on CJK fixed-pitch layout
28
// model.
29
//
30
// This function assumes that a fixed-pitch CJK text has following
31
// characteristics:
32
//
33
// - Most glyphs are designed to fit within the same sized square
34
// (imaginary body). Also they are aligned to the center of their
35
// imaginary bodies.
36
// - The imaginary body is always a regular rectangle.
37
// - There may be some extra space between character bodies
38
// (tracking).
39
// - There may be some extra space after punctuations.
40
// - The text is *not* space-delimited. Thus spaces are rare.
41
// - Character may consists of multiple unconnected blobs.
42
//
43
// And the function works in two passes. On pass 1, it looks for such
44
// "good" blobs that has the pitch same pitch on the both side and
45
// looks like a complete CJK character. Then estimates the character
46
// pitch for every row, based on those good blobs. If we couldn't find
47
// enough good blobs for a row, then the pitch is estimated from other
48
// rows with similar character height instead.
49
//
50
// Pass 2 is an iterative process to fit the blobs into fixed-pitch
51
// character cells. Once we have estimated the character pitch, blobs
52
// that are almost as large as the pitch can be considered to be
53
// complete characters. And once we know that some characters are
54
// complete characters, we can estimate the region occupied by its
55
// neighbors. And so on.
56
//
57
// We repeat the process until all ambiguities are resolved. Then make
58
// the final decision about fixed-pitchness of each row and compute
59
// pitch and spacing parameters.
60
//
61
// (If a row is considered to be propotional, pitch_decision for the
62
// row is set to PITCH_CORR_PROP and the later phase
63
// (i.e. Textord::to_spacing()) should determine its spacing
64
// parameters)
65
//
66
// This function doesn't provide all information required by
67
// fixed_pitch_words() and the rows need to be processed with
68
// make_prop_words() even if they are fixed pitched.
69
void
compute_fixed_pitch_cjk
(
ICOORD
page_tr,
// top right
70
TO_BLOCK_LIST *port_blocks);
// input list
71
72
#endif // CJKPITCH_H_
mnt
data
src
tesseract-ocr
textord
cjkpitch.h
Generated on Thu Nov 1 2012 20:19:49 for Tesseract by
1.8.1