Jeffrey Sorensen

Computer Scientist, Electrical Engineer

Currently

Engineering lead for the Jigsaw PerspectiveAPI transformer based user generated content classification system currently used by dozens of social media companies in moderation workflows.

Research interests

Machine Learning
Language Modeling
Natural Language Processing
Speech Recognition
Machine Translation
Cryptography

Education

1983 - 1993 Rensselaer Polytechnic Institute - Troy, New York

Ph. D. Electrical Engineering, May 1993
- Thesis “Hierarchical Pattern Classification for High Performance Text-independent Speaker Verification Systems”
M. Eng. Computer and Systems Engineering, May 1988
- Thesis “Extraction of a single voice from a channel containing multiple simultaneous speakers”
B. S. Computer and Systems Engineering with Minor in Management, May 1987

Selected Publications

A more complete list is available at Google Scholar.

Conference Proceedings

JUAGE at SemEval-2023 Task 10: Parameter Efficient Classification, Sorensen, Korre, Pavlopoulos, Tomanek, Thain, Dixon, Laugier, SemEval 2023.

Harmful Language Datasets: An Assessment of Robustness, Korre, Pavlopoulos, Sorensen, Laugier, Androutsopoulos, Dixon, Barrón-cedeño, ACL WOAH 2023.

A new generation of perspective api: Efficient multilingual character-level transformers, Lees, Tran, Tay, Sorensen, Gupta & Metzler, ACM SIGKDD 2022.

SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification Fersini, Gasparini, Rizzi,Saibene, Chulvi, Rosso, Lees & Sorensen, SemEval 2022.

Lost in Distillation: A Case Study in Toxicity Modeling, Chvasta, Lees, Sorensen, Vasserman, Goyal, ACM WOAH 2022.

SemEval-2021 Task 5: Toxic Spans Detection Pavlopoulos, Sorensen, Laugier & Androutsopoulos, SemEval 2021.

Toxicity Detection: Does Context Really Matter?, Pavlopoulos, Sorensen, Sixon, Thain and Androutsopoulos, ACL 2020.

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification, Borkan, Dixon, Sorensen, Thain, and Vasserman. ACM WWW 2019.

Measuring and Mitigating Unintended Bias in Text Classification, Dixon, Li, Sorensen, Thain, and Vasserman, AAAI AIES 2018.

Accurate and compact large vocabulary speech recognition on mobile devices, Lei, Senior, Gruenstein & Sorensen, Interspeech 2013.

The OpenGrm open-source finite-state grammar software libraries, Roark, Sproat, Allauzen, Riley, Sorensen & Tai, ACL 2012.

Text Search Protocols with Simulation Based Security, Gennaro, Hazay & Sorensen, PKC 2010.

Syntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation, Visweswariah, Navratil, Sorensen, Chenthamarakshan, & Kambhatla, COLING 2010.

Maximum Entropy Based Restoration of Arabic Diacritics, Zitouni, Sorensen & Sarikaya, COLING-ACL 2006.

Merkle tree authentication of HTTP responses, Bayardo & Sorensen, WWW 2005.

The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution, Zitouni, Sorensen, Luo, and Florian. ACL SEMITC workshop 2005.

Dependency Tree Kernels for Relation Extraction, Culotta & Sorensen, ACL 2004.

A hand-held speech-to-speech translation system, Zhou, Gao, Sorensen, Dechelotte & Picheny, IEEE ASRU 2003.

Spread spectrum signaling for speech watermarking, Cheng & Sorensen, IEEE ICASSP 2001

A distance measure between collections of distributions and its application to speaker recognition, Beigi, Maes & Sorensen, IEEE ICASSP 1998.

Journals

Toxicity detection sensitive to conversational context Xenos, Pavlopoulos, Androutsopoulos, Dixon, Sorensen, & Laugier, First Monday, 27(5), 2022.

Automata Evaluation and Text Search Protocols with Simulation-Based Security, Journal of Cryptology V29, p 243–282, 2016.

Patents

Securely classifying data, Bikel & Sorensen, 2008.

Conversational data mining, Kanevsky, Maes & Sorensen, 2003.

Multimodal speech-to-speech language translation and display, Gao, Gu, Liu & Sorensen, 2002.

Method and apparatus for presenting images representative of an utterance with corresponding decoded speech, Basson, Kanevsky & Sorensen, 2001.

Head-mounted display content transformer, Kanevsky & Sorensen, 2001.

Multi-channel telephone data collection, collaboration and conferencing system and method of using the same, Sorensen, Dharaniprgada, Tydlitat, 2000.

Method and apparatus for processing information signals based on content, Maes, Padmanabhan & Sorensen, 2000.

Biometric authentication system with encrypted models, Gennaro, Halevi, Maes, Rabin & Soresnsen, 1999.

Phrase splicing and variable substitution using a trainable speech synthesizer Donovan, Franz, Roukos & Sorensen, 1998.

Apparatus and methods for identifying homophones among words in a speech recognition system, Ittycheria, Maes, Monkowski & Sorensen, 1998

Professional Societies

IEEE - Member since 1988

ACM - Member since 1996

Association for Computational Linguistics

ACL Area Chair
SemEval Reviewer
Workshop on Online Abuse and Harms Reviewer

New York Acadamy of Sciences

Occupation

2010 - Present Google, Inc, New York, NY

Senior Software Engineer 2010 - 2014
Staff Software Engineer 2014 - Present

1997 - 2010 IBM Thomas J Watson Research Center, Yorktown Heights, NY

Senior Software Engineer 1999 - 2010
Advisory Software Engineer 1997 - 1999

1993 - 1996 Dictaphone Corporation, Stratford, CT

Member Technical Staff 1996
Engineering Specialist 1993-1996