[Home ] [Archive]    
Main Menu
Journal Information::
Home::
Archive::
For Authors::
For Reviewers::
Principles of Transparency::
Contact us::
::
Search in website

Advanced Search
..
Committed to

AWT IMAGE

Attribution-NonCommercial
CC BY-NC


AWT IMAGE

Open Access Publishing


AWT IMAGE

Prevent Plagiarism

..
Registered in


..
Statistics
Journal volumes: 17
Journal issues: 34
Articles views: 683905
Articles downloads: 342215

Total authors: 581
Unique authors: 422
Repeated authors: 159
Repeated authors percent: 27

Submitted articles: 368
Accepted articles: 266
Rejected articles: 25
Published articles: 219

Acceptance rate: 72.28
Rejection rate: 6.79

Average Time to Accept: 282 days
Average Time to First Review: 27.2 days
Average Time to Publish: 26.1 days

Last 3 years statistics:
Submitted articles: 54
Accepted articles: 37
Rejected articles: 6
Published articles: 17

Acceptance rate: 68.52
Rejection rate: 11.11

Average Time to Accept: 205 days
Average Time to First Review: 6.7 days
Average Time to Publish: 118 days
____
..
:: Volume 4, Issue 1 (9-2007) ::
JSRI 2007, 4(1): 91-108 Back to browse issues page
Probabilistic Linkage of Persian Record with Missing Data
Afshin Fallah , Mohsen Mohammadzadeh 1
1- , mohsen_m@modares.ac.ir
Abstract:   (3262 Views)

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The identification of duplications in a data set or the same identities in different data sets is called record linkage. Linkage of data sets that their information is registered in the context of Persian language has special difficulties due to particular writing characteristics of the Persian language such as connectedness of letters in words, existence of different writing versions for some letters and dependency of writing shape of letters to their position in words.

In this paper, usual difficulties in linkage of data sets that their information is registered in the context of the Persian language are studied and some solutions are presented. We introduced some compatible methods for preparing and preprocessing of files through standardization, blocking and selection of identifier variables. A new method is proposed for dealing with missing data that is a major problem in real world applications of record linkage theory. The proposed method takes into account the probability of occurrence of missing data. We also proposed an algorithm for increasing the number of comparable fields based on partitioning of composite fields such as address. Finally, the proposed methods are used to link records of establishing censuses in a geographical region in Iran. The results show that taking into account the probability of the occurrence of missing data increases the efficiency of the record linkage process. In addition, using different codes and notations for data registration in different times, leads to information loss. Specially, it is necessary to design a general pattern for writing addresses in Iran, considering geographical and environmental situations.

Keywords: record, field, matching, records linkage, likelihood ratio, EM algorithm.
Full-Text [PDF 1881 kb]   (891 Downloads)    
Type of Study: Research | Subject: General
Received: 2016/02/21 | Accepted: 2016/02/21 | Published: 2016/02/21
Send email to the article author

Add your comments about this article
Your username or Email:

CAPTCHA



XML   Persian Abstract   Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Fallah A, Mohammadzadeh M. Probabilistic Linkage of Persian Record with Missing Data. JSRI 2007; 4 (1) :91-108
URL: http://jsri.srtc.ac.ir/article-1-179-en.html


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Volume 4, Issue 1 (9-2007) Back to browse issues page
مجله‌ی پژوهش‌های آماری ایران Journal of Statistical Research of Iran JSRI
Persian site map - English site map - Created in 0.05 seconds with 42 queries by YEKTAWEB 4645