Jeffrey Sorensen

Computer Scientist, Electrical Engineer

Currently

Engineering lead for the Jigsaw PerspectiveAPI transformer based user generated content classification system currently used by dozens of social media companies in moderation workflows.

Research interests

Education

1983 - 1993 Rensselaer Polytechnic Institute - Troy, New York

Selected Publications

A more complete list is available at Google Scholar.

Conference Proceedings

JUAGE at SemEval-2023 Task 10: Parameter Efficient Classification, Sorensen, Korre, Pavlopoulos, Tomanek, Thain, Dixon, Laugier, SemEval 2023.

Harmful Language Datasets: An Assessment of Robustness, Korre, Pavlopoulos, Sorensen, Laugier, Androutsopoulos, Dixon, Barrón-cedeño, ACL WOAH 2023.

A new generation of perspective api: Efficient multilingual character-level transformers, Lees, Tran, Tay, Sorensen, Gupta & Metzler, ACM SIGKDD 2022.

SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification Fersini, Gasparini, Rizzi,Saibene, Chulvi, Rosso, Lees & Sorensen, SemEval 2022.

Lost in Distillation: A Case Study in Toxicity Modeling, Chvasta, Lees, Sorensen, Vasserman, Goyal, ACM WOAH 2022.

SemEval-2021 Task 5: Toxic Spans Detection Pavlopoulos, Sorensen, Laugier & Androutsopoulos, SemEval 2021.

Toxicity Detection: Does Context Really Matter?, Pavlopoulos, Sorensen, Sixon, Thain and Androutsopoulos, ACL 2020.

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification, Borkan, Dixon, Sorensen, Thain, and Vasserman. ACM WWW 2019.

Measuring and Mitigating Unintended Bias in Text Classification, Dixon, Li, Sorensen, Thain, and Vasserman, AAAI AIES 2018.

Accurate and compact large vocabulary speech recognition on mobile devices, Lei, Senior, Gruenstein & Sorensen, Interspeech 2013.

The OpenGrm open-source finite-state grammar software libraries, Roark, Sproat, Allauzen, Riley, Sorensen & Tai, ACL 2012.

Text Search Protocols with Simulation Based Security, Gennaro, Hazay & Sorensen, PKC 2010.

Syntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation, Visweswariah, Navratil, Sorensen, Chenthamarakshan, & Kambhatla, COLING 2010.

Maximum Entropy Based Restoration of Arabic Diacritics, Zitouni, Sorensen & Sarikaya, COLING-ACL 2006.

Merkle tree authentication of HTTP responses, Bayardo & Sorensen, WWW 2005.

The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution, Zitouni, Sorensen, Luo, and Florian. ACL SEMITC workshop 2005.

Dependency Tree Kernels for Relation Extraction, Culotta & Sorensen, ACL 2004.

A hand-held speech-to-speech translation system, Zhou, Gao, Sorensen, Dechelotte & Picheny, IEEE ASRU 2003.

Spread spectrum signaling for speech watermarking, Cheng & Sorensen, IEEE ICASSP 2001

A distance measure between collections of distributions and its application to speaker recognition, Beigi, Maes & Sorensen, IEEE ICASSP 1998.

Journals

Toxicity detection sensitive to conversational context Xenos, Pavlopoulos, Androutsopoulos, Dixon, Sorensen, & Laugier, First Monday, 27(5), 2022.

Automata Evaluation and Text Search Protocols with Simulation-Based Security, Journal of Cryptology V29, p 243–282, 2016.

Patents

Securely classifying data, Bikel & Sorensen, 2008.

Conversational data mining, Kanevsky, Maes & Sorensen, 2003.

Multimodal speech-to-speech language translation and display, Gao, Gu, Liu & Sorensen, 2002.

Method and apparatus for presenting images representative of an utterance with corresponding decoded speech, Basson, Kanevsky & Sorensen, 2001.

Head-mounted display content transformer, Kanevsky & Sorensen, 2001.

Multi-channel telephone data collection, collaboration and conferencing system and method of using the same, Sorensen, Dharaniprgada, Tydlitat, 2000.

Method and apparatus for processing information signals based on content, Maes, Padmanabhan & Sorensen, 2000.

Biometric authentication system with encrypted models, Gennaro, Halevi, Maes, Rabin & Soresnsen, 1999.

Phrase splicing and variable substitution using a trainable speech synthesizer Donovan, Franz, Roukos & Sorensen, 1998.

Apparatus and methods for identifying homophones among words in a speech recognition system, Ittycheria, Maes, Monkowski & Sorensen, 1998

Professional Societies

IEEE - Member since 1988

ACM - Member since 1996

Association for Computational Linguistics

New York Acadamy of Sciences

Occupation

2010 - Present Google, Inc, New York, NY

1997 - 2010 IBM Thomas J Watson Research Center, Yorktown Heights, NY

1993 - 1996 Dictaphone Corporation, Stratford, CT