Research on Implementation

The second section is by far the largest in CS, and focuses on presenting technical system implementations.  Implementations vary widely, to assist the reader they have been loosely classified into papers that propose privacy product and system architecture, applied techniques for online privacy, location-based privacy issues, and (as mentioned earlier) a set of papers on artificial intelligence techniques in health care (including health information management, user adaptive expert systems and decision support systems literature).

Privacy Architecture in Products and Systems

Guarda and Zannone are among few researchers who suggest an implementable model for engineering privacy requirements (Guarda & Zannone, 2009).  Their paper introduces the field of “privacy engineering” to describe the current technical efforts to systematically embed privacy relevant legal primitives into technical design.  Like the work on privacy ontologies (Hassan & Logrippo, 2009) in order to align the privacy artifacts, Guarda and Zannone note that aligning enterprise goals with privacy policies, data protection policies and user preferences is key.  Picking up on privacy requirements engineering, the authors highlight the criticality of this phase by proposing features necessary to develop privacy-aware systems.  

The authors also provide an interesting comparison of EU requirements with US regulations, noting this is a fundamental consideration in borderless information flows.  

The Venter et al paper on Privacy Intrusion Detection Systems (PIDS) is a unique contribution to the field (Venter, Olivier, & Eloff, 2004).  The authors propose a system for detecting privacy intrusions on a high level by detecting anomalous behaviour and reacting by throttling data access and / or issuing alerts using privacy enhancing technologies (PETs), including the Layered Privacy Architecture work that encompasses the personal control layer, organizational safeguards layers, private / confidential communication layer and the identity management layer.  The PIDS (like traditional IDS models) is applied to an unauthorized query case study based on the assumption that information is stored in a central networked repository, and the results can be monitored and throttled depending on the anomaly profile feature.  Venter et al note that the successful implementation of the PIDS depends on a PIDS anomaly profile for each subject derived from the subject’s role, including features, which may be difficult. 

While Guarda and Zannone and Venter et al focus on infrastructure, the majority of scholars in this classification of research focus on specific implementations.   Two of the more interesting examples are represented here in Clarkson et al, who present a technique for authenticating physical documents based on random, naturally occurring imperfections in paper texture (Clarkson et al., 2009) and Jha et al who use genomic computation as a case study for developing a privacy-preserving implementation for computational biology (Jha, Kruger, & Shmatikov, 2008).  Where Clarkson’s focuses on how to authenticate the paper itself – not the content printed on a page – the Jha et al work on DNA collection is an inherent threat to privacy. 

The two researchers take oppositional approaches to embedding privacy.  

Clarkson seeks to create a process which allows for registration and validation of the sheet of paper without a central registration authority, thereby minimizing privacy risk.  On the other hand, Jha et al state that protecting the privacy of individual DNA when the corresponding genomic sequences is available is not realistic, so they choose to outline a practical tool to support collaborative analysis of genomic data without requiring release of underlying DNA and protein sequences.  The Jha et al privacy protecting tool is a cryptographic secure protocol for collaborative two-party computation on data using dynamic programming algorithms (edit distance, Smith-Waterman) including oblivious transfer and oblivious circuit evaluation.  They test 3 privacy-preserving edit distance protocols, and a privacy-preserving Smith-Waterman before generalization to privacy-preserving dynamic programming experiments, and conclude by noting that performance of the algorithms are tractable even for instances of substantial size as the first step towards a practical method for privacy in genomic computation.

Clarkson et al discuss the privacy implications of the model using undesirable attacks such as an optical-scan voting system contaminated by a corrupt election official.  More generally, they point out that the ability to re-identify ordinary sheets of paper casts doubt on any supposedly private information gathering process that relies on paper forms.  In other words, anonymous is not necessarily anonymous because of the physical characteristics of the paper.  

An additional area of research within the technical implementations centers on information retrieval.  

Goldberg sets out with the goal of fetching items from database servers without the server learning which item the end user has requested (Goldberg, 2007).  This is a particularly appealing challenge for privacy-advocates as it ensures not only end-user privacy, but also subject matter privacy.  This type of information retrieval is also discussed in later papers on AI techniques that can be utilize in decision-support, and Goldberg notes the importance of specifying the additional requirements that exist within the health care domain.

Applied Techniques for Online Privacy

While Guarda and Zannone touch briefly on online privacy policies and user preferences, including the adoption of P3P and the P3P Preference Language (APPEL), privacy-aware access control languages, including E-P3P, EPAL and XACML; there are other researchers that have an in-depth focus on the use of these techniques for online privacy.  Cranor et al study the deployment of the standard W3C platform for privacy preferences (P3P) format to assess usefulness to end users and researchers (Cranor, Egelman, Sheng, McDonald, & Chowdhury, 2008).  The methodology for the study required the analysis of both machine-readable P3P policies and human-readable privacy policies; in order to assess both, Cranor et alutilized the Privacy Finder P3P evaluator and the W3C P3P Validator.  

The policy study also examined, as many researchers in this area do, the content of policies (including settings, marketing and sharing), industry trends (type of data collected, uses for data collected, data recipients) and popular sites.  

There is a growing body of work on policy errors, semantic and syntactic, which Cranor et al contribute to in this work.  The authors provided a thorough analysis of the three critical areas of P3P implementation, and a high level overview of each aspect of the P3P protocol with an overall positive conclusion about the future of P3P in the context of a forthcoming legislative impetus.

Other researchers in the privacy policy online environment study the efficacy of P3P as a viable technology for privacy protection.  Reay et al uniquely apply signal theory, and assess performance using traditional methods of signal theory analysis (Reay, Dick, & Miller, 2009).  While the predictions presented are not particularly surprising (P3P adoption will remain stagnant, little or no corrective maintenance on invalid P3P documents will be undertaken and little or no perfective maintenance will be undertaken on P3P policies because sellers are unmotivated) there have utilized a unique method for arriving at their conclusion that may provide other insights when applied to other privacy / CS questions.

Kelley et al proposes a new format for displaying the P3P about commercial websites to users called a Privacy Nutrition Label (Kelley, Bresee, Cranor, & Reeder, 2009).  

The paper describes two sets of tests: the first series was used to develop the design of the final label; the second used to assess the use of the final label.  The authors conclude that the final Privacy Nutritional Label is a more accurate reflection of a given privacy policy, faster to use and more pleasurable for the user (Kelley et al., 2009).

The last two papers are part of many that propose applied techniques in social networking to address online privacy.  Narayanana and Shmatikov present a methodology that demonstrates how anonymization techniques used by social network providers (Twitter, Flickr and LiveJournal) is also easily undone with an error rate of 12% (Narayanan & Shmatikov, 2009).  

Xiao and Varenhorst explicitly examine Twitter, and the inadvertent disclosure of personal information by end users because of general unawareness about the functionality of the service (Xiao & Varenhorst, 2009).  Narayanana and Shmatikov’s work contributes to the growing body of work on the importance of a robust de-identification protocol for personal information, while Xiao and Varenhorst supply enhanced privacy controls and a new alert function to be built in to Twitter.

Location Based Privacy

Implementation research on location based privacy varies.  

Applewhite provides a fascinating overview of the evolution wireless technologies and standards as a precursor to the discussion of wireless networks and interoperability issues (Applewhite, 2002).  He demonstrates that privacy has changed now that technology is cheaper (GPS chips) and more reliable (wireless infrastructure), commenting that location technology can now be embedded into wristwatches and pagers, or even implanted under the skin.  Privacy issues are often disregarded by manufacturers because the technology and associated services are optional.  The author points out as these services become endemic, commercialization is a natural and predictable result, including personal information contained therein, raising some interesting ethical considerations.

Work on wireless networks by Li et al further highlights the special considerations for privacy in this area: uncontrollable environments, sensor-node resource constraints and topological constraints (Li, Zhang, Das, & Thuraisingham, 2009).  While privacy has been studied in the generic networking domain, these considerations represent a difficulty in the extrapolation of that work to wireless sensor networks.  Traditional data protections in networking during data aggregation include cluster-based, slice-mixed and generic privacy solutions, while context-oriented protections include location based for the data source (flooding methods [baseline, probabilistic], random walk and fake sources) and the base station (local adversaries, global adversaries and temporal privacy).  

Other researchers tackle specific protocols for location-based services to ensure end user privacy.  

Zhong et al presents an overview of a variety of cell phone services that allow end users to ‘find’ each other (Zhong, Goldberg, & Hengartner, 2007).  

The typical privacy control has been location cloaking, where the device or a third party cloaks the location before giving it to a service provider.  The proposed solution presented by the authors is based on homomorphic encryption, using the techniques of public-key cryptography; they provide an overview of the Paillier cryptosystem and the CGS97 scheme.  In the first of the three protocols presented, Louis, the authors described two phases where two people can inform each other of their locations in the optional second phase of the protocol only if the conditions of the first phase (actual location proximity) have been met.  This requires the participation of a third person to undertake location matching.  

In the second protocol, Lester, the information disclosure is only one-way.  The distance between people can be learned, but only depending on context and there is no control if either person inputs the incorrect information.  The third protocol, Pierre, builds on the Lester protocol but gives the second person more confidence in privacy controls.  If proximity is achieved, information is given but fewer details about exact location are presented based on the distance input by the end user when they sign up for the service.