Dividi righe di testo nel documento scansionato

Sto cercando di trovare un modo per dividere le righe di testo in un documento scansionato che è stato adattato a soglia. In questo momento, sto memorizzando i valori dei pixel del documento come valori non scritti da 0 a 255, e sto prendendo la media dei pixel in ogni riga, e ho diviso le righe in intervalli in base al fatto se la media dei valori dei pixel è maggiore di 250, e quindi prendo la mediana di ogni intervallo di linee per cui ciò detiene. Tuttavia, questo metodo a volte fallisce, in quanto possono essere presenti macchie nere sull'immagine.Dividi righe di testo nel documento scansionato

Esiste un modo più resistente al rumore per eseguire questa operazione?

MODIFICA: ecco alcuni codici. "deformato" è il nome dell'immagine originale, "tagli" è dove voglio dividere l'immagine.

warped = threshold_adaptive(warped, 250, offset = 10) 
warped = warped.astype("uint8") * 255 

# get areas where we can split image on whitespace to make OCR more accurate 
color_level = np.array([np.sum(line)/len(line) for line in warped]) 
cuts = [] 
i = 0 
while(i < len(color_level)): 
    if color_level[i] > 250: 
     begin = i 
     while(color_level[i] > 250): 
      i += 1 
     cuts.append((i + begin)/2) # middle of the whitespace region 
    else: 
     i += 1

EDIT 2: immagine Campione aggiunto

fonte

2016-01-24 Alex

Puoi mostrare un po 'di codice? – RvdK

Ok, ho aggiunto il codice che sto usando. – Alex

sarà utile un'immagine di esempio. – sturkmen

Dalla vostra immagine in ingresso, è necessario per rendere il testo come bianco, e lo sfondo in nero

È necessario quindi calcolare l'angolo di rotazione della tua bolletta. Un approccio semplice è quello di trovare la minAreaRect di tutti i punti bianchi (findNonZero), e si ottiene:

Quindi è possibile ruotare il disegno di legge, in modo che il testo è orizzontale:

Ora è possibile calcolare la proiezione orizzontale (reduce). Puoi prendere il valore medio in ogni riga. Applicare una soglia th sull'istogramma per tenere conto di alcuni disturbi nell'immagine (qui ho usato 0, cioè senza rumore). Le linee con solo sfondo avranno il valore >0, le righe di testo avranno il valore 0 nell'istogramma. Quindi prendi la coordinata media del bin di ogni sequenza continua di bin bianchi nell'istogramma. Quello sarà il y coordinata delle linee:

Ecco il codice. È in C++, ma poiché la maggior parte del lavoro è con le funzioni OpenCV, dovrebbe essere facilmente convertibile in Python.Almeno, è possibile utilizzare questo come punto di riferimento:

#include <opencv2/opencv.hpp> 
using namespace cv; 
using namespace std; 

int main() 
{ 
    // Read image 
    Mat3b img = imread("path_to_image"); 

    // Binarize image. Text is white, background is black 
    Mat1b bin; 
    cvtColor(img, bin, COLOR_BGR2GRAY); 
    bin = bin < 200; 

    // Find all white pixels 
    vector<Point> pts; 
    findNonZero(bin, pts); 

    // Get rotated rect of white pixels 
    RotatedRect box = minAreaRect(pts); 
    if (box.size.width > box.size.height) 
    { 
     swap(box.size.width, box.size.height); 
     box.angle += 90.f; 
    } 

    Point2f vertices[4]; 
    box.points(vertices); 

    for (int i = 0; i < 4; ++i) 
    { 
     line(img, vertices[i], vertices[(i + 1) % 4], Scalar(0, 255, 0)); 
    } 

    // Rotate the image according to the found angle 
    Mat1b rotated; 
    Mat M = getRotationMatrix2D(box.center, box.angle, 1.0); 
    warpAffine(bin, rotated, M, bin.size()); 

    // Compute horizontal projections 
    Mat1f horProj; 
    reduce(rotated, horProj, 1, CV_REDUCE_AVG); 

    // Remove noise in histogram. White bins identify space lines, black bins identify text lines 
    float th = 0; 
    Mat1b hist = horProj <= th; 

    // Get mean coordinate of white white pixels groups 
    vector<int> ycoords; 
    int y = 0; 
    int count = 0; 
    bool isSpace = false; 
    for (int i = 0; i < rotated.rows; ++i) 
    { 
     if (!isSpace) 
     { 
      if (hist(i)) 
      { 
       isSpace = true; 
       count = 1; 
       y = i; 
      } 
     } 
     else 
     { 
      if (!hist(i)) 
      { 
       isSpace = false; 
       ycoords.push_back(y/count); 
      } 
      else 
      { 
       y += i; 
       count++; 
      } 
     } 
    } 

    // Draw line as final result 
    Mat3b result; 
    cvtColor(rotated, result, COLOR_GRAY2BGR); 
    for (int i = 0; i < ycoords.size(); ++i) 
    { 
     line(result, Point(0, ycoords[i]), Point(result.cols, ycoords[i]), Scalar(0, 255, 0)); 
    } 

    return 0; 
}

fonte

2016-01-26 12:38:42 Miki

Fantastico, grazie mille !! – Alex

ciao Miki, davvero ho visto la tua risposta e svalutato prima. l'ho dimenticato e ho usato lo stesso approccio rispondendo a una domanda su [answers.opencv.org/question/85884/](http://answers.opencv.org/question/85884/). ho usato 'Canny' per trovare le linee di divisione. forse ti piace vederlo. – sturkmen

@sturkmen interessante; D – Miki

Procedure essenziali come @Miki,

leggere il sorgente

trebbiato

trovare minAreaRect

ordito dalla matrice ruotata

trova e disegna upp er e limiti inferiori

Mentre codice Python:

#!/usr/bin/python3 
# 2018.01.16 01:11:49 CST 
# 2018.01.16 01:55:01 CST 
import cv2 
import numpy as np 

## (1) read 
img = cv2.imread("img02.jpg") 

## (2) threshold 
th, threshed = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU) 
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 

## (3) minAreaRect on the nozeros 
pts = cv2.findNonZero(threshed) 
ret = cv2.minAreaRect(pts) 

(cx,cy), (w,h), ang = ret 
if w>h: 
    w,h = h,w 
    ang += 90 

## (4) Find rotated matrix, do rotation 
M = cv2.getRotationMatrix2D((cx,cy), ang, 1.0) 
rotated = cv2.warpAffine(threshed, M, (img.shape[1], img.shape[0])) 

## (5) find and draw the upper and lower boundary of each lines 
hist = cv2.reduce(rotated,1, cv2.REDUCE_AVG).reshape(-1) 

th = 2 
H,W = img.shape[:2] 
uppers = [y for y in range(H-1) if hist[y]<=th and hist[y+1]>th] 
lowers = [y for y in range(H-1) if hist[y]>th and hist[y+1]<=th] 

rotated = cv2.cvtColor(rotated, cv2.COLOR_GRAY2BGR) 
for y in uppers: 
    cv2.line(rotated, (0,y), (W, y), (255,0,0), 1) 

for y in lowers: 
    cv2.line(rotated, (0,y), (W, y), (0,255,0), 1) 

cv2.imwrite("result.png", rotated)

Infine risultato:

fonte

2018-01-15 17:56:48 Silencer

Dividi righe di testo nel documento scansionato

risposta

Problemi correlati