publications
* denotes equal contribution and joint lead authorship.
CAFO: Feature-Centric Explanation on Time Series Classification
ACM KDD 2024
Acceptance Rate 20%, Best Poster Award, @UNIST AI Tech Workshop 2024
In multivariate time series (MTS) classification, finding the important features (e.g., sensors) for model performance is crucial yet challenging due to the complex, high-dimensional nature of MTS data, intricate temporal dynamics, and the necessity for domain-specific interpretations. Current explanation methods for MTS mostly focus on time-centric explanations, apt for pinpointing important time periods but less effective in identifying key features. This limitation underscores the pressing need for a feature-centric approach, a vital yet often overlooked perspective that complements time-centric analysis. To bridge this gap, our study introduces a novel feature-centric explanation and evaluation framework for MTS, named CAFO (Channel Attention and Feature Orthgonalization). CAFO employs a convolution-based approach with channel attention mechanisms, incorporating a depth-wise separable channel attention module (DepCA) and a QR decomposition-based loss for promoting feature-wise orthogonality. We demonstrate that this orthogonalization enhances the separability of attention distributions, thereby refining and stabilizing the ranking of feature importance. This improvement in feature-wise ranking enhances our understanding of feature explainability in MTS. Furthermore, we develop metrics to evaluate global and class-specific feature importance. Our framework's efficacy is validated through extensive empirical analyses on two major public benchmarks and real-world datasets, both synthetic and self-collected, specifically designed to highlight class-wise discriminative features. The results confirm CAFO's robustness and informative capacity in assessing feature importance in MTS classification tasks. This study not only advances the understanding of feature-centric explanations in MTS but also sets a foundation for future explorations in feature-centric explanations.Heterogeneous Trading Behaviors of Individual Investors.
Financial Research Letters 2023
Acceptance Rate 28%
Identifying household finance heterogeneity via deep clustering
Annals of Operations Research 2023
Acceptance Rate 33.3%
Households are becoming increasingly heterogeneous. While previous studies have revealed many important insights (e.g., wealth effect, income effect), they could only incorporate two or three variables at a time. However, in order to have a more detailed understanding of complex household heterogeneity, more variables should be considered simultaneously. In this study, we argue that advanced clustering techniques can be useful for investigating high-dimensional household heterogeneity. A deep learning-based clustering method is used to effectively handle the high-dimensional balance sheet data of approximately 50,000 households. The employment of appropriate dimension-reduction techniques is the key to incorporate the full joint distribution of high-dimensional data in the clustering step. Our study suggests that various variables should be used together to explain household heterogeneity. Asset variables are found to be crucial for understanding heterogeneity within wealthy households, while debt variables are more important for those households that are not wealthy. In addition, relationships with sociodemographic variables (e.g., age, education, and family size) were further analyzed. Although clusters are found only based on financial variables, they are shown to be closely related to most sociodemographic variables.Household Financial Health: A Machine Learning Approach for Data-Driven Diagnosis and Prescription
Quantitative Finance 2023
Acceptance Rate 23.3%
Household finances are being threatened by unprecedented social and economic upheavals, including an aging society and slow economic growth. Numerous researchers and practitioners have provided guidelines for improving the financial status of households; however, the challenge of handling heterogeneous households remains nontrivial. In this study, we propose a new data-driven framework for the financial health of households to address the needs for diagnosing and improving financial health. This research extends the concept of healthcare to household finance. We develop a novel deep learning-based diagnostic model for estimating household financial health risk scores from real-world household balance sheet data. The proposed model can successfully manage the heterogeneity of households by extracting useful latent representations of household balance sheet data while incorporating the risk information of each variable. That is, we guide the model to generate higher latent values for households with risky balance sheets. We also show that the gradient of the model can be utilized for prescribing recommendations for improving household financial health. The robustness and validity of the new framework are demonstrated using empirical analyses.SimStock : Representation Model for Stock Similarities
ICAIF 2023
Acceptance Rate 21% (Oral-Accept)
In this study, we introduce SimStock, a novel framework leveraging self-supervised learning and temporal domain generalization techniques to represent similarities of stock data. Our model is designed to address two critical challenges: 1) temporal distribution shift (caused by the non-stationarity of financial markets), and 2) ambiguity in conventional regional and sector classifications (due to rapid globalization and digitalization). SimStock exhibits outstanding performance in identifying similar stocks across four real-world benchmarks, encompassing thousands of stocks. The quantitative and qualitative evaluation of the proposed model compared to various baseline models indicates its potential for practical applications in stock market analysis and investment decision-making.Stop-loss adjusted labels for machine learning-based trading of risky assets
Financial Research Letters 2023
Acceptance Rate 28%
Since the rise of ML/AI, many researchers and practitioners have been trying to predict future stock price movements. In actual implementations, however, stop-loss is widely adopted to manage risks, which sells an asset if its price goes below a predetermined level. Hence, some buy signals from prediction models could be wasted if stop-loss is triggered. In this study, we propose a stop-loss adjusted labeling scheme to reduce the discrepancy between prediction and decision making. It can be easily incorporated to any ML/AI prediction models. Experimental results on U.S. futures and cryptocurrencies show that this simple tweak significantly reduces risk.A Study on the Estimation of Apartment Price Index: Focused on the Machine Learning Algorithm
working paper
* denotes equal contribution and joint lead authorship.
Geodesic Flow Kernels for Semi-Supervised Learning on Mixed-Variable Tabular Dataset
Top AI conferences (coming soon) 2025
Acceptance Rate 20%
Temporal Representation Learning for Stock Similarities and Its Applications in Investment Management
Finance Journal (coming soon) 2025
Acceptance Rate 20%, Best Paper Award @the Korean Academic Society of Business Administration 2024