Jeffrey Sorensen
Computer Scientist, Electrical Engineer
Currently
Engineering lead for the Jigsaw PerspectiveAPI transformer based user generated content classification system currently used by dozens of social media companies in moderation workflows.
Research interests
- Machine Learning
- Language Modeling
- Natural Language Processing
- Speech Recognition
- Machine Translation
- Cryptography
Education
1983 - 1993
Rensselaer Polytechnic Institute - Troy, New York
- Ph. D. Electrical Engineering, May 1993
- M. Eng. Computer and Systems Engineering, May 1988
- B. S. Computer and Systems Engineering with Minor in Management, May 1987
Selected Publications
A more complete list is available at Google Scholar.
Conference Proceedings
JUAGE at SemEval-2023 Task 10: Parameter Efficient Classification, Sorensen, Korre, Pavlopoulos, Tomanek, Thain, Dixon, Laugier, SemEval 2023.
Harmful Language Datasets: An Assessment of Robustness, Korre, Pavlopoulos, Sorensen, Laugier, Androutsopoulos, Dixon, Barrón-cedeño, ACL WOAH 2023.
A new generation of perspective api: Efficient multilingual character-level transformers, Lees, Tran, Tay, Sorensen, Gupta & Metzler, ACM SIGKDD 2022.
SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification Fersini, Gasparini, Rizzi,Saibene, Chulvi, Rosso, Lees & Sorensen, SemEval 2022.
Lost in Distillation: A Case Study in Toxicity Modeling, Chvasta, Lees, Sorensen, Vasserman, Goyal, ACM WOAH 2022.
SemEval-2021 Task 5: Toxic Spans Detection Pavlopoulos, Sorensen, Laugier & Androutsopoulos, SemEval 2021.
Toxicity Detection: Does Context Really Matter?, Pavlopoulos, Sorensen, Sixon, Thain and Androutsopoulos, ACL 2020.
Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification, Borkan, Dixon, Sorensen, Thain, and Vasserman. ACM WWW 2019.
Measuring and Mitigating Unintended Bias in Text Classification, Dixon, Li, Sorensen, Thain, and Vasserman, AAAI AIES 2018.
Accurate and compact large vocabulary speech recognition on mobile devices, Lei, Senior, Gruenstein & Sorensen, Interspeech 2013.
The OpenGrm open-source finite-state grammar software libraries, Roark, Sproat, Allauzen, Riley, Sorensen & Tai, ACL 2012.
Text Search Protocols with Simulation Based Security, Gennaro, Hazay & Sorensen, PKC 2010.
Syntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation, Visweswariah, Navratil, Sorensen, Chenthamarakshan, & Kambhatla, COLING 2010.
Maximum Entropy Based Restoration of Arabic Diacritics, Zitouni, Sorensen & Sarikaya, COLING-ACL 2006.
Merkle tree authentication of HTTP responses, Bayardo & Sorensen, WWW 2005.
The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution, Zitouni, Sorensen, Luo, and Florian. ACL SEMITC workshop 2005.
Dependency Tree Kernels for Relation Extraction, Culotta & Sorensen, ACL 2004.
A hand-held speech-to-speech translation system, Zhou, Gao, Sorensen, Dechelotte & Picheny, IEEE ASRU 2003.
Spread spectrum signaling for speech watermarking, Cheng & Sorensen, IEEE ICASSP 2001
A distance measure between collections of distributions and its application to speaker recognition, Beigi, Maes & Sorensen, IEEE ICASSP 1998.
Journals
Toxicity detection sensitive to conversational context Xenos, Pavlopoulos, Androutsopoulos, Dixon, Sorensen, & Laugier, First Monday, 27(5), 2022.
Automata Evaluation and Text Search Protocols with Simulation-Based Security, Journal of Cryptology V29, p 243–282, 2016.
Patents
Securely classifying data, Bikel & Sorensen, 2008.
Conversational data mining, Kanevsky, Maes & Sorensen, 2003.
Multimodal speech-to-speech language translation and display, Gao, Gu, Liu & Sorensen, 2002.
Method and apparatus for presenting images representative of an utterance with corresponding decoded speech, Basson, Kanevsky & Sorensen, 2001.
Head-mounted display content transformer, Kanevsky & Sorensen, 2001.
Multi-channel telephone data collection, collaboration and conferencing system and method of using the same, Sorensen, Dharaniprgada, Tydlitat, 2000.
Method and apparatus for processing information signals based on content, Maes, Padmanabhan & Sorensen, 2000.
Biometric authentication system with encrypted models, Gennaro, Halevi, Maes, Rabin & Soresnsen, 1999.
Phrase splicing and variable substitution using a trainable speech synthesizer Donovan, Franz, Roukos & Sorensen, 1998.
Apparatus and methods for identifying homophones among words in a speech recognition system, Ittycheria, Maes, Monkowski & Sorensen, 1998
Professional Societies
IEEE - Member since 1988
ACM - Member since 1996
Association for Computational Linguistics
- ACL Area Chair
- SemEval Reviewer
- Workshop on Online Abuse and Harms Reviewer
New York Acadamy of Sciences
Occupation
2010 - Present
Google, Inc, New York, NY
- Senior Software Engineer 2010 - 2014
- Staff Software Engineer 2014 - Present
1997 - 2010
IBM Thomas J Watson Research Center, Yorktown Heights, NY
- Senior Software Engineer 1999 - 2010
- Advisory Software Engineer 1997 - 1999
1993 - 1996
Dictaphone Corporation, Stratford, CT
- Member Technical Staff 1996
- Engineering Specialist 1993-1996