|
|
Organ aging signatures in the plasma proteome track health and disease
Show authors
Nature volume 624, pages164–172 (2023)Cite this article
Abstract
Animal studies show aging varies between individuals as well as between organs within an individual1,2,3,4, but whether this is true in humans and its effect on age-related diseases is unknown. We utilized levels of human blood plasma proteins originating from specific organs to measure organ-specific aging differences in living individuals. Using machine learning models, we analysed aging in 11 major organs and estimated organ age reproducibly in five independent cohorts encompassing 5,676 adults across the human lifespan. We discovered nearly 20% of the population show strongly accelerated age in one organ and 1.7% are multi-organ agers. Accelerated organ aging confers 20–50% higher mortality risk, and organ-specific diseases relate to faster aging of those organs. We find individuals with accelerated heart aging have a 250% increased heart failure risk and accelerated brain and vascular aging predict Alzheimer’s disease (AD) progression independently from and as strongly as plasma pTau-181 (ref. 5), the current best blood-based biomarker for AD. Our models link vascular calcification, extracellular matrix alterations and synaptic protein shedding to early cognitive decline. We introduce a simple and interpretable method to study organ aging using plasma proteomics data, predicting diseases and aging effects.
초록
동물 연구에서는
노화가 개인 간뿐만 아니라 한 개인 내에서도 장기별로 다양하게 나타난다는 사실이 밝혀졌으나¹,²,³,⁴,
인간에게서도 이러한 현상이 실제로 존재하는지,
그리고 이것이 노화 관련 질환에 어떤 영향을 미치는지는 알려지지 않았다.
우리는
특정 장기에서 유래한 인간 혈장 단백질 수준을 이용하여
살아있는 개인에서 장기별 노화 차이를 측정하였다.
머신러닝 모델을 활용해 11개 주요 장기의 노화를 분석하였으며,
인간 수명 전반에 걸친 5,676명의 성인을 대상으로 한
5개의 독립 코호트에서 장기 나이를 재현 가능하게 추정하였다.
우리는
전체 인구의 약 20%가 한 장기에서 강하게 가속된 노화를 보이며,
1.7%는 다중 장기 가속 노화자(multi-organ agers)임을 발견했다.
장기 가속 노화는 20–50% 더 높은 사망 위험을 초래하며,
장기 특이적 질환은 해당 장기의 더 빠른 노화와 관련이 있다.
심장 노화가 가속된 사람은 심부전 위험이 250% 증가하며,
뇌와 혈관 노화가 가속된 경우 알츠하이머병(AD) 진행을 예측하는 데 있어
혈장 pTau-181(참고문헌 5) — 현재 가장 우수한 혈액 기반 AD 바이오마커 — 와 독립적으로,
그리고 그에 못지않게 강력하게 예측한다.
우리의 모델은
혈관 석회화,
세포외 기질 변화,
시냅스 단백질 탈락 등이
초기 인지 저하와 연관되어 있음을 보여준다.
우리는
혈장 단백질체(proteomics) 데이터를 이용해
장기 노화를 연구하고 질병 및 노화 효과를 예측할 수 있는 간단하고 해석 가능한 방법을 제안한다
Similar content being viewed by others
Organ-specific proteomic aging clocks predict disease and longevity across diverse populations
Article Open access26 November 2025
Revealing the genetic architectures underlying organ-specific aging based on proteomic data
Article Open access11 December 2025
Plasma proteomics links brain and immune system aging with healthspan and longevity
Article Open access09 July 2025
Main
Aging results in organism-wide deterioration of tissue structure and function that drastically increases the risk of most chronic diseases. Comprehensive studies of the molecular changes that occur with aging across multiple organs in mice have identified unique molecular aging trajectories and timings1,2,3,4, and susceptibility and resilience to diseases of aging in specific organs such as the brain, heart and kidney varies substantially across the population6. However, little is known about how human organs change molecularly with age. A molecular understanding of human organ aging is of critical importance to address the massive global disease burden of aging and could revolutionize patient care, preventative medicine and drug development7. In particular, preclinical studies have demonstrated that rejuvenating interventions affect organs differently3,8. To translate these studies into transformative medicines, we must be able to accurately measure aging across the body and understand the diversity of human aging not only across but also within individuals.
While many methods to measure molecular aging in humans have been developed9,10,11, most of them provide just a single measure of aging for the whole body. This is difficult to interpret given the complexity of human aging trajectories. Some recent methods have used clinical chemistry markers which include some markers of organ function12,13,14,15. However, many of these markers have low organ specificity, making them difficult to interpret for organ-specific aging. Methods to measure brain aging have used MRI-based brain volume and functional connectivity measurements, which are costly and do not provide molecular insights16, or have required tissue samples, which prevents their application in living persons17. Building off the wealth of literature and clinical practice that uses certain organ-specific plasma proteins to noninvasively assess aspects of organ health, such as alanine transaminase for liver damage, we hypothesized that comprehensive quantification of organ-specific proteins in plasma could enable minimally invasive assessment and tracking of human aging for any organ.
본문
노화는
조직 구조와 기능의 전신적 퇴화를 초래하여 대부분의 만성 질환 위험을 극적으로 증가시킨다.
쥐를 대상으로 한 여러 장기에 걸친 노화 관련 분자 변화에 대한 포괄적인 연구들은
독특한 분자적 노화 궤적과 시기¹,²,³,⁴를 확인하였으며,
뇌, 심장, 신장과 같은 특정 장기의 노화 관련 질환에 대한 취약성과 회복력은
인구 집단 내에서 상당히 다양하게 나타난다⁶.
그러나
인간 장기가 나이 들면서 분자적으로 어떻게 변화하는지에 대해서는 거의 알려진 바가 없다.
인간 장기 노화에 대한 분자적 이해는
노화로 인한 거대한 글로벌 질병 부담을 해결하는 데 매우 중요하며,
환자 진료, 예방 의학, 약물 개발을 혁신적으로 변화시킬 수 있다⁷.
특히 전임상 연구에서는
젊음을 회복시키는 개입이 장기마다 다르게 영향을 미친다는 사실이 입증되었다³,⁸.
이러한 연구 결과를 혁신적인 의약품으로 전환시키기 위해서는,
전신에 걸친 노화를 정확하게 측정할 수 있어야 하며,
개인 간뿐만 아니라 개인 내에서도 인간 노화의 다양성을 이해해야 한다.
인간의 분자적 노화를 측정하는 많은 방법들이 개발되었으나⁹,¹⁰,¹¹,
대부분의 방법은 전신에 대한 단일한 노화 지표만 제공한다.
이는
인간 노화 궤적의 복잡성을 고려할 때
해석이 어렵다.
최근 일부 방법들은
장기 기능의 일부 지표를 포함하는 임상 화학 마커를 사용하였다¹²,¹³,¹⁴,¹⁵.
그러나 이러한 마커들 중 다수는
장기 특이성이 낮아 장기 특이적 노화 해석에 어려움이 있다.
뇌 노화를 측정하는 방법으로는
MRI 기반 뇌 용적 및 기능적 연결성 측정이 사용되지만
이는 비용이 많이 들고 분자적 통찰을 제공하지 못하며¹⁶,
또는 조직 샘플이 필요하여 살아있는 사람에게 적용할 수 없다¹⁷.
간 손상을 평가하기 위한
알라닌 아미노전이효소(alanine transaminase)와 같이
특정 장기 특이적 혈장 단백질을 비침습적으로 이용하여
장기 건강을 평가하는 풍부한 문헌과 임상 관행을 바탕으로,
우리는 혈장에서 장기 특이적 단백질을 포괄적으로 정량화하면
최소 침습적으로 인간의 모든 장기 노화를 평가하고 추적할 수 있을 것이라는 가설을 세웠다.
문치연이 찾는 오믹스분석
--> 간세포가 회복되는 지표 단백질이 무엇인가? 프로테오믹스
--> 그 지표단백질의 핵심 전사인자는 무엇인가? 전사체학
Plasma proteins can model organ aging
To test this, we measured 4,979 proteins in a total of 5,676 subjects across five independent cohorts (Supplementary Table 1) and mapped the putative organ-specific plasma proteome, which we used to train models of organ aging (Fig. 1a). We mapped the organ-specific plasma proteome using human organ bulk RNA sequencing (RNA-seq) data from the Genotype-Tissue Expression (GTEx) project18. We classified genes as ‘organ enriched’ if they were expressed at least four times higher in one organ compared to any other organ, according to the definition proposed in the Human Protein Atlas19 (Extended Data Fig. 1, Supplementary Tables 2 and 3, and Methods). We annotated the 4,979 human proteins measured by the SomaScan assay with this information and found 893 (18%) proteins met this definition, with the highest number from the brain. We performed additional quality control to remove proteins with a high coefficient of variation or a low correlation between the two different versions of the SomaScan assay present across our cohorts, leaving us with 4,778 proteins (856 organ enriched, 17.9%) which were used for downstream analysis (Supplementary Fig. 1 and Supplementary Tables 4 and 5).
혈장 단백질로 장기 노화를 모델링할 수 있다
이 가설을 검증하기 위해,
우리는 5개의 독립 코호트에 걸쳐 총 5,676명의 피험자에서 4,979개의 단백질을 측정하였으며(보충 표 1),
이를 이용해 장기 노화 모델을 훈련시키기 위한
추정 장기 특이적 혈장 단백질체(putative organ-specific plasma proteome)를 매핑하였다(그림 1a).
혈액 단백질 분석 → 각 장기별 '생물학적 나이' 예측 → 실제 나이보다 장기가 더 늙었는지(age gap) 알아내서 → 질병 위험(알츠하이머, 심부전 등) 미리 예측하는 새로운 방법을 개발했다는 그림 |
장기 특이적 혈장 단백질체는
Genotype-Tissue Expression (GTEx) 프로젝트¹⁸에서 얻은
인간 장기 벌크 RNA 시퀀싱(RNA-seq) 데이터를 이용해 매핑하였다.
우리는 Human Protein Atlas¹⁹에서 제안된 정의에 따라,
한 장기에서 다른 어떤 장기보다 최소 4배 이상 발현된 유전자를
‘장기 풍부(organ enriched)’로 분류하였다(확장 데이터 그림 1, 보충 표 2 및 3, 그리고 방법 참조).
SomaScan 분석으로 측정된 4,979개의 인간 단백질에 이 정보를 주석 처리한 결과,
893개(18%) 단백질이 이 정의에 부합하였으며,
가장 많은 수가 뇌에서 유래하였다.
우리는
코호트 간에 존재하는 SomaScan 분석의 두 가지 버전 간 상관관계가 낮거나
변동계수(coefficient of variation)가 높은 단백질을 제거하는 추가 품질 관리를 수행하였고,
최종적으로 하류 분석에 사용된
4,778개 단백질(856개 장기 풍부, 17.9%)을 확보하였다(보충 그림 1 및 보충 표 4와 5).
Fig. 1: Plasma proteins can model organ aging.
a, Study design to estimate organ-specific biological age. A gene was called organ-specific if its expression was four-fold higher in one organ compared to any other organ in GTEX bulk organ RNA-seq. This annotation was then mapped to the plasma proteome. Mutually exclusive organ-specific protein sets were used to train bagged LASSO chronological age predictors with data from 1,398 healthy individuals in the Knight-ADRC cohort. An ‘organismal’ model, which used the nonorgan-specific (organ shared) proteins, and a ‘conventional’ model, which used all proteins regardless of specificity, were also trained. Models were tested in four independent cohorts: Covance (n = 1,029), LonGenity (n = 962), SAMS (n = 192) and Stanford-ADRC (n = 420); models were also tested in the AD patients in the Knight-ADRC cohort (n = 1,677). To test the validity of organ aging models, the age gap was associated with multiple measures of health and disease. An example age prediction (predicted versus chronological age) and an example age gap versus phenotype association (age gap versus phenotype, standard boxplot) are shown.
b, Individuals (ID) with the same conventional age gap can have different organ age gap profiles. Three example participants are shown. Bar represents mean age gap across n = 13 age gaps.
c, Pairwise correlation of organ age gaps from n = 3,774 healthy participants across all cohorts. Distribution of all pairwise correlations is shown in inset histogram, with dotted line median correlation. The control age gap was highly correlated with the organismal age gap (r = 0.98), the sole outlier in the inset distribution plot.
d, Identification of extreme agers, defined by a two standard deviation increase or decrease in at least one age gap. A representative kidney ager, heart ager and multi-organ ager are shown.
e, All extreme agers were identified (23% of all n = 5,676 individuals) and clustered after setting age gaps below an absolute z-score of 2 to 0. The mean age gaps for all organs in the kidney agers, heart agers and multi-organ agers clusters are shown.
a. 장기 특이적 생물학적 나이 추정을 위한 연구 설계 유전자가 GTEx 벌크 장기 RNA-seq 데이터에서 다른 어떤 장기보다 한 장기에서 4배 이상 높게 발현될 경우, 해당 유전자를 장기 특이적(organ-specific)으로 분류하였다. 이 주석(annotation)을 혈장 단백질체(plasma proteome)에 매핑하였다. 서로 배타적인(mutually exclusive) 장기 특이적 단백질 세트를 이용해 Knight-ADRC 코호트의 건강한 1,398명 데이터를 바탕으로 bagged LASSO 방식의 연대기적 나이(chronological age) 예측 모델을 훈련시켰다. 또한 비장기 특이적(organ shared) 단백질을 사용한 ‘organismal’ 모델과 모든 단백질(특이성 무관)을 사용한 ‘conventional’ 모델도 함께 훈련하였다. 모델들은 4개의 독립 코호트에서 검증되었다: Covance (n=1,029), LonGenity (n=962), SAMS (n=192), Stanford-ADRC (n=420). Knight-ADRC 코호트의 알츠하이머병(AD) 환자(n=1,677)에서도 테스트하였다. 장기 노화 모델의 타당성을 검증하기 위해 age gap(예측 생물학적 나이와 실제 연대기적 나이의 차이)을 건강 및 질환 관련 여러 지표와 연관지어 분석하였다. 예시로 나이 예측(예측 나이 vs 연대기적 나이) 그래프와 age gap vs 표현형(phenotype) 연관성(표준 박스플롯)이 제시되어 있다.
b. 동일한 conventional age gap을 가진 개인(ID)이라도 장기별 age gap 프로파일이 다를 수 있다. 세 명의 예시 참가자를 보여준다. 막대는 n=13개의 age gap에 대한 평균 age gap을 나타낸다.
c. 모든 코호트의 건강 참가자 n=3,774명에 대한 장기 age gap의 쌍별(pairwise) 상관관계. 모든 쌍별 상관관계 분포를 inset 히스토그램으로 표시하였으며, 점선은 중앙값(median correlation)을 나타낸다. control age gap은 organismal age gap과 매우 높은 상관관계(r=0.98)를 보였으며, inset 분포 플롯에서 유일한 이상치(outlier)였다.
d. 적어도 하나의 age gap에서 2 표준편차 이상 증가 또는 감소한 극단적 노화자(extreme agers)를 식별하였다. 대표적인 신장 노화자(kidney ager), 심장 노화자(heart ager), 다중 장기 노화자(multi-organ ager)를 보여준다.
e. 모든 극단적 노화자(전체 n=5,676명 중 23%)를 식별한 후, 절대 z-score 2 미만의 age gap을 0으로 설정하고 클러스터링하였다. 신장 노화자(kidney agers), 심장 노화자(heart agers), 다중 장기 노화자(multi-organ agers) 클러스터에서 모든 장기의 평균 age gap을 표시하였다.
We and others have previously shown that plasma proteins can be used to train machine learning models to estimate chronological age in independent cohorts20,21. For each individual, an aging model produces an ‘age gap’, a measure of that individual’s biological age relative to other same-aged peers based on their molecular profile9 (Fig. 1a). Several studies have shown associations between age gaps and mortality risk or other age-related phenotypes9, supporting the hypothesis that the age gap contains information about relative biological aging.
Based on this concept, we trained a bagged ensemble of least absolute shrinkage and selection operator (LASSO) aging models for 11 major organs using the mutually exclusive organ-enriched proteins we identified as inputs (Fig. 1a, Extended Data Fig. 2a,b, Supplementary Fig. 3 and Supplementary Tables 6–8). We chose to restrict our analyses to adipose tissue, artery, brain, heart, immune tissue, intestine, kidney, liver, lung, muscle and pancreas because of their relatively well-understood contributions to diseases of aging and the availability of relevant age-related phenotype data in the tested cohorts. We also trained an ‘organismal’ aging model using the 3,907 organ-nonspecific plasma proteins as inputs to compare the contribution of specific organs to an organ-shared aging signature, and a ‘conventional’ proteomic aging model using all 4,778 proteins to compare the organ aging models to a global plasma proteomic aging signature as previously reported20,21. We trained our models in 1,398 healthy participants from the Knight Alzheimer’s Disease Research Center (Knight-ADRC) cohort (mean age = 75, age range = 27–104) and then tested these models in four fully independent cohorts and in held-out test participants with dementia in the Knight-ADRC. (Fig. 1a, Extended Data Figs. 2 and 3, and Supplementary Fig. 2). All 11 organ aging models and the organismal model significantly estimated age in all five cohorts after multiple test correction (Supplementary Fig. 3b). Organ-specific proteins selected by our approach were highly enriched for organ-specific functions (Supplementary Information).
We observed across all cohorts that individuals with the same conventional age gap had diverse organ aging profiles (Fig. 1b). At the population level, this resulted in a low-to-moderate correlation between the age gaps of different organs (mean pairwise Pearson r = 0.29, Fig. 1c). While organ aging is correlated, the majority of variance in one organ age gap is not explained by others, with the exception of the organismal and conventional age gaps which were highly correlated. Further, we observed that some individuals had extreme aging in one or more organs relative to the general population (Fig. 1d). We scored individuals across all cohorts as outliers for a given organ age gap using a two standard deviation cutoff and clustered individuals into extreme aging types (e-ageotypes) (Fig. 1e and Extended Data Fig. 4a–c). Although it might be expected that extreme aging in one organ would co-occur with extreme aging in other organs, we instead observed segregation into distinct organ e-ageotypes. We found that approximately 18.4% of individuals had a highly organ-specific e-ageotype that was dominated by the aging of only one organ. Only approximately 1.7% of individuals showed extreme aging in multiple organs; the only multi-organ e-ageotype discovered through unbiased clustering was defined by extreme adipose, brain, conventional, heart, immune, liver and organismal age gaps. These observations suggest that organ age gaps may capture unique aging information, which may have implications for organ-specific biological aging and diseases of aging.
이전 연구와의 연결 및 모델 개발
우리 연구팀과 다른 연구자들20,21은
이전에 혈장 단백질(plasma proteins)을 이용해
독립 코호트에서 연대기적 나이(chronological age)를 추정하는 머신러닝 모델을 훈련시킬 수 있음을 보여주었다.
각 개인에 대해 노화 모델은
‘age gap’(나이 차이)을 생성하는데,
이는 분자 프로파일을 기반으로
같은 연령대의 동년배 대비 해당 개인의 생물학적 나이(biological age)를 나타내는 지표이다9 (그림 1a).
여러 연구에서 age gap이
사망 위험(mortality risk)이나 다른 노화 관련 표현형(age-related phenotypes)과 연관되어 있음이 확인되었으며9,
이는 age gap이 상대적 생물학적 노화에 대한 정보를 담고 있다는 가설을 뒷받침한다.
이 개념을 바탕으로 우리는
서로 배타적인(mutually exclusive) 장기 풍부 단백질(organ-enriched proteins)을 입력으로 사용해
11개 주요 장기에 대한 bagged ensemble LASSO(least absolute shrinkage and selection operator) 노화 모델을 훈련시켰다 (그림 1a, 확장 데이터 그림 2a,b, 보충 그림 3 및 보충 표 6–8).
분석 대상 장기를
지방조직(adipose tissue), 동맥(artery),
뇌(brain), 심장(heart), 면역조직(immune tissue),
장(intestine), 신장(kidney), 간(liver), 폐(lung), 근육(muscle), 췌장(pancreas)으로 제한한 이유는
이들 장기가 노화 관련 질환에 미치는 기여가 비교적 잘 알려져 있고,
테스트 코호트에서 관련 노화 표현형 데이터가 충분히 이용 가능하기 때문이다.
또한 장기 비특이적(organ-nonspecific) 혈장 단백질 3,907개를 입력으로 사용하는 ‘organismal’ 노화 모델을 훈련시켜
특정 장기가 장기 공유 노화 시그니처(organ-shared aging signature)에 기여하는 정도를 비교하였으며,
이전 보고된 바와 같이20,21
모든 4,778개 단백질을 사용하는 ‘conventional’ 단백질체 노화 모델도 훈련시켜
장기 노화 모델과 전역 혈장 단백질체 노화 시그니처를 비교하였다.
모델은
Knight Alzheimer’s Disease Research Center (Knight-ADRC) 코호트의 건강한 참가자
1,398명(평균 연령 75세, 연령 범위 27–104세)에서 훈련되었으며,
이후 4개의 완전히 독립된 코호트와 Knight-ADRC 코호트 내에서
보류된 치매 환자 테스트 세트에서 검증되었다 (그림 1a, 확장 데이터 그림 2 및 3, 보충 그림 2).
다중 검정 보정 후 모든 5개 코호트에서
11개 장기 노화 모델과 organismal 모델이 모두 나이를 유의미하게 추정하였다 (보충 그림 3b).
우리 접근법으로 선택된 장기 특이적 단백질은
장기 특이적 기능에 고도로 풍부(enriched)하였다 (보충 정보).
모든 코호트에서 동일한 conventional age gap을 가진 개인들이
장기 노화 프로파일은 매우 다양하다는 것을 관찰하였다 (그림 1b).
인구 수준에서는
서로 다른 장기의 age gap 간 상관관계가 낮음에서 중간 정도(mean pairwise Pearson r = 0.29, 그림 1c)였다.
장기 노화가 상관되어 있기는 하지만,
organismal과 conventional age gap을 제외하고는
한 장기의 age gap 분산 대부분이 다른 장기로 설명되지 않았다.
또한 일부 개인은
일반 인구 대비 하나 이상의 장기에서 극단적인 노화를 보였다 (그림 1d).
우리는 모든 코호트 참가자를 주어진 장기 age gap에 대해 2
표준편차 cutoff으로 이상치(outlier)로 점수화하고,
이를 극단 노화 유형(extreme aging types, e-ageotypes)으로 클러스터링하였다
(그림 1e 및 확장 데이터 그림 4a–c).
한 장기의 극단 노화가
다른 장기의 극단 노화와 동반될 것으로 예상되었으나,
대신 뚜렷한 장기별 e-ageotypes로 분리되는 것을 관찰하였다.
약 18.4%의 개인이
단 하나의 장기 노화가
지배적인 고도로 장기 특이적인 e-ageotype을 보였으며,
다중 장기 극단 노화는 약 1.7%에 불과하였다.
비편향 클러스터링을 통해 발견된 유일한 다중 장기 e-ageotype은
지방, 뇌, conventional, 심장, 면역, 간 및 organismal age gap의 극단으로 정의되었다.
이러한 관찰은
장기 age gap이 독특한 노화 정보를 포착할 수 있음을 시사하며,
장기 특이적 생물학적 노화 및 노화 관련 질환에 함의가 있다.
Organ age predicts health and disease
To assess the relationship between organ age and biological aging, we tested whether organ e-ageotypes were associated with nine age-related disease states for which we had sufficient data in at least two independent cohorts; AD, atrial fibrillation, cerebrovascular disease, diabetes, heart attack, hypercholesterolaemia, hypertension, obesity and gait impairment. Organ e-ageotypes were associated with specific disease states with known high impact on their respective organs (23 of 117, 20%, associations significant in a meta-analysis after multiple testing correction, Extended Data Fig. 4d and Supplementary Table 9). The kidney ageotype was the most significantly associated with metabolic diseases (diabetes, obesity, hypercholesterolaemia and hypertension), the heart ageotype was the most significantly associated with heart diseases (atrial fibrillation and heart attack), the muscle ageotype was the most significantly associated with gait impairment, the brain ageotype was the most significantly associated with cerebrovascular disease and the organismal ageotype was the most significantly associated with AD. At the whole population level, the relationships between organ age gaps and disease showed the same trends as ageotypes, but more diseases were significantly associated with age gaps due to higher statistical power (65 of 117, 56%, statistically significant after multiple test correction, Extended Data Fig. 4e and Supplementary Table 10).
At the population level, the two most significant associations between disease and age gap were between the kidney age gap and metabolic disease traits. Individuals with hypertension had kidneys that were approximately one year older than their same-aged peers, while individuals with diabetes had kidneys approximately 1.3 years older (Fig. 2a,b and Supplementary Tables 8 and 10). The third and fourth top associations were between the heart age gap and the heart aging traits atrial fibrillation (2.8 years older) and heart attack (2.6 years older) (Fig. 2c,d). Overall, we found that certain diseases, such as heart attack and AD, were associated with accelerated aging in virtually all organs, while others had impacts on a particular organ or subset of organs (Extended Data Fig. 4e and Supplementary Table 10).
장기 나이가 건강과 질병을 예측한다 (Organ age predicts health and disease)
장기 나이와 생물학적 노화 간 관계를 평가하기 위해,
최소 두 개 이상의 독립 코호트에서 충분한 데이터를 가진
9가지 노화 관련 질환 상태
알츠하이머병(AD),
심방세동(atrial fibrillation),
뇌혈관질환(cerebrovascular disease),
당뇨(diabetes),
심근경색(heart attack),
고콜레스테롤혈증(hypercholesterolaemia),
고혈압(hypertension),
비만(obesity),
보행 장애(gait impairment)에 대해
장기 e-ageotype과의 연관성을 테스트하였다.
장기 e-ageotype은
해당 장기에 큰 영향을 미치는 것으로 알려진 특정 질환 상태와 연관되었다
(117개 중 23개, 20%; 다중 검정 보정 후 메타분석에서 유의미, 확장 데이터 그림 4d 및 보충 표 9).
신장 ageotype은
대사질환(당뇨, 비만, 고콜레스테롤혈증, 고혈압)과 가장 강하게 연관되었고,
심장 ageotype은 심장 질환(심방세동, 심근경색),
근육 ageotype은 보행 장애,
뇌 ageotype은 뇌혈관질환,
organismal ageotype은 AD와 가장 강하게 연관되었다.
인구 전체 수준에서 장기 age gap과 질환 간 관계는
ageotype과 동일한 경향을 보였으나,
통계적 검정력이 높아 더 많은 질환이 유의미하게 연관되었다
(117개 중 65개, 56%; 다중 검정 보정 후 유의미, 확장 데이터 그림 4e 및 보충 표 10).
인구 수준에서
질환과 age gap 간 가장 유의미한 두 연관성은
신장 age gap과 대사질환 특성 간이었다.
고혈압 환자는
동년배 대비 신장이 약 1년 더 늙었으며,
당뇨 환자는 신장이 약 1.3년 더 늙었다 (그림 2a,b 및 보충 표 8과 10).
세 번째와 네 번째로 강한 연관성은
심장 age gap과 심장 노화 특성인 심방세동(2.8년 더 늙음) 및 심근경색(2.6년 더 늙음)이었다 (그림 2c,d).
전체적으로 심근경색과 AD와 같은 일부 질환은
거의 모든 장기의 가속 노화와 연관되었으나,
다른 질환들은 특정 장기 또는 장기 하위 집합에만 영향을 미쳤다 (확장 데이터 그림 4e 및 보충 표 10).
Fig. 2: Organ age predicts health and disease.
a, A cross-cohort meta-analysis of the association (linear regression) between the kidney age gap and hypertension (with hypertension n = 1,566, without n = 1,561). False discovery rate (FDR) P valuemeta = 4.05 × 10−40, effect sizemeta = 0.486. (Supplementary Table 10). b, As in a, kidney age gap versus diabetes (with diabetes n = 335, without n = 2,839). FDR P valuemeta = 1.15 × 10−24, effect sizemeta = 0.604. c, As in a, heart age gap versus atrial fibrillation or pacemaker (with atrial fibrillation n = 239, without n = 2,936). FDR P valuemeta = 5.32 × 10−21, effect sizemeta = 0.657. d, As in a, but for heart age gap versus heart attack (with heart attack history n = 280, without n = 2,904). FDR P valuemeta = 1.77 × 10−20, effect sizemeta = 0.615. e, All kidney aging model coefficients. x axis shows % of model instances in the bagged ensemble that include the protein. Size of bubbles is scaled by the absolute value of the mean model weight across model instances (absolute value of y axis) (Supplementary Table 7). f, Single-cell RNA expression of kidney51 aging model proteins. Mean normalized expression values shown. g, As in e, but for the heart aging model. h, Human heart single-cell RNA expression of heart52. Mean normalized expression values shown. i, Cox proportional hazard regression analysis of the relationship between organ age gap and future congestive heart failure risk over 15 years of follow-up in the LonGenity cohort for those without heart failure history at baseline (n = 26 events in 812 individuals). FDR P valueHeart = 7.07 × 10−7, hazard ratioHeart = 2.37. (Supplementary Table 11). j, Cox proportional hazard regression analysis of the relationship between organ age gap and future mortality risk, over 15 years of follow-up in the LonGenity cohort (n = 173 events in 864 individuals). FDR P valueConventional = 2.27 × 10−10, hazard ratioConventional = 1.54. (Supplementary Table 12). All error bars represent 95% confidence intervals.
Kidney aging proteins were highly expressed by kidney cell types (Fig. 2e,f) and had known roles in kidney biology and disease. Using feature importance plots, the model identified renin (REN), a kidney enzyme known to regulate blood pressure via the renin-angiotensin pathway22, as an important protein in kidney aging. It also identified the putative longevity factor klotho (KL)23, as well as multiple proteins with unknown functions including uromodulin (UMOD) and kidney associated antigen 1 (KAAG1), as important kidney aging proteins. UMOD has been genetically linked to chronic kidney disease, where it is observed to have age-dependent effects24, and rare mutations are the major cause of autosomal dominant tubulointerstitial kidney disease25.
Heart aging proteins were expressed primarily by cardiomyocytes (Fig. 2g,h) and had known roles in heart biology and disease. Pro-brain natriuretic peptide (NPPB), a negative regulator of blood pressure that increases in response to heart damage, and troponin T (TNNT2), a heart muscle protein involved in contraction, had the strongest weights in the heart aging model (Fig. 2g). They are both established clinical markers of acute heart failure26, and NPPB has been previously associated with heart attack risk27. This suggests the possibility of a link between subclinical heart disease and the ‘normal’ heart aging process, which should be investigated further with more detailed heart imaging and electrophysiology. Less well-characterized heart proteins include cardiac myosin light chain (MYL7), peroxidasin like (PXDNL) and bone morphogenetic protein 10 (BMP10). MYL7 is expressed by atrial cardiomyocytes and has recently become a promising target for hypertrophic cardiomyopathy28, suggesting that this could be a repurposing target for heart aging more generally.
Given the strong associations between heart aging traits and the heart age gap, we used longitudinal follow-up among healthy participants in the LonGenity cohort to test if organ age was significantly associated with future heart failure risk (Fig. 2i and Supplementary Table 11). We found that among people with no active disease or clinically abnormal biomarkers at baseline, every 4.1 years of additional heart age (one standard deviation) conferred an almost 2.5-fold increased risk of heart failure over a 15-year follow-up (23% increased risk per year of heart aging, Fig. 2i). Age gaps from multiple other tissues, but not the conventional aging model, also trended towards significance.
We next tested the associations between organ age gaps and all-cause mortality. We found that the age gaps from 10 out of 11 organs, the organismal model and the conventional model were significantly associated with future risk of all-cause mortality after multiple test correction in the LonGenity cohort over 15 years of follow-up (Fig. 2j and Supplementary Table 12). A standard deviation increase (approximately four years of extra organ aging, Supplementary Table 8) in heart, adipose, liver, pancreas, brain, lung, immune or muscle age gap each conferred between 15–50% increased all-cause mortality risk. These hazard ratios are a similar size to methylation-based mortality predictors in independent aging cohorts over similar follow-up times, despite the fact that organ aging models are trained to predict chronological age instead of mortality directly (DNAm GrimAge hazard ratio = 1.3, 14 year mortality follow-up29). Further, we found that for some organs, there was a nonlinear relationship between the age gap and mortality risk (Supplementary Information, Supplementary Fig. 4 and Supplementary Table 13).
Finally, to better understand the relationship between organ age and additional markers of health and disease, we tested the associations between organ age gaps and 43 clinical biochemistry and cell count markers in the test cohort Covance (Extended Data Fig. 5 and Supplementary Fig. 5, see Supplement Information for additional discussion). We also used these markers to calculate Phenotypic age14 (PhenoAge), a clinical biochemistry-based aging clock which predicts mortality and morbidity risk, for all participants in Covance (Extended Data Fig. 5a). We found that the PhenoAge age gap was significantly correlated with multiple organ age gaps, but only a small portion of the variance in any model was explained by another (Extended Data Fig. 5b).
We found 226 out of 559 (40%) associations between organ age gaps and clinical biochemistry markers were significant after multiple testing correction (Extended Data Fig. 5c and Supplementary Table 14). The strongest associations included associations between liver age gap and blood AST:ALT ratio, a clinical marker of liver health and function that is known to change with age (adjusted Pearson r = 0.25, q = 6.13 × 10−17), and between kidney age gap and serum creatinine, the standard clinical marker of kidney function (adjusted Pearson r = 0.23, q = 1.65 × 10−16). While these results are highly significant, they only partially explain the relationship between organ age gaps and disease phenotypes. Even after correcting for estimated glomerular filtration rate (eGFR), the kidney age gap is still significantly associated with hypertension and diabetes (Supplementary Fig. 6).
Collectively, organ age gap associations with disease and blood biochemistry demonstrate that aging models derived from organ-specific plasma proteins capture disease-relevant heterogeneity of aging within and across individuals, which is not captured by other aging clocks or clinical markers.
Brain aging in cognitive decline and AD
Although the largest risk factor for neurodegenerative diseases is age, little is known about the contribution of molecular brain aging to disease. The brain age gap correlated significantly with AD in held-out participants in the Knight-ADRC, but did not replicate in the Stanford Alzheimer’s Disease Research Center (Stanford-ADRC) (Supplementary Table 10). Therefore, to better understand how underlying proteins contributed to the brain aging model’s predictive abilities for brain aging phenotypes, we developed the feature importance for biological aging (FIBA) algorithm, which uses feature permutation to generate a per-protein importance score for both chronological and biological age, as defined by a particular age-related trait (Extended Data Fig. 6a and Methods). We applied FIBA to the brain age model using the trait global clinical dementia rating (CDRGLOB) in the Knight-ADRC cohort to understand how brain proteins contributed to the association between the age gap and cognitive decline. We observed that some proteins, such as complexins, increased both the model age prediction accuracy and the age gap association with dementia severity (FIBA+), while others decreased the age gap association with dementia severity (FIBA−) (Fig. 3a and Supplementary Table 15).
Fig. 3: Brain aging in cognitive decline and AD.
a, FIBA was used to test the contributions of brain aging proteins to associations between brain age gap and global clinical dementia rating (CDRGLOB) (y axis) or chronological age prediction accuracy (x axis). Permutation of some proteins reduced the brain age gap association with CDRGLOB (FIBA+), while permutation of others strengthened it (FIBA−). FIBA+ brain aging proteins were used to train a cognition-optimized brain aging model (CognitionBrain) from cognitively unimpaired individuals in Knight-ADRC. (Supplementary Table 15). FI, feature importance. b, CognitionBrain aging model. Age estimation in all cohorts (ii) and bootstrap aging model coefficients (ii). Size of bubbles is scaled by the absolute value of the mean model weight. (Supplementary Table 15). c, A cross-cohort meta-analysis of the association (linear regression) between the CognitionBrain age gap and AD diagnosis (with AD n = 1,441, without n = 2,052). P valuemeta = 9.23 × 10−36, effect sizemeta = 0.448. (Supplementary Table 15). d, A multivariate cox proportional hazard model of future dementia progression risk over five years in Stanford-ADRC (n = 48 events in 325 individuals). P valueCognitionBrain = 8.95 × 10−3, hazard ratioCongitionBrain = 1.57. e, Kaplan–Meier curve for the CPH model in f. Risk of dementia progression for different levels of CognitionBrain AgeGap and PlasmaPTau181 while all other covariates are held constant. Displayed hazard ratio is a first-order estimate of the combined hazard ratio. f, Human brain single-cell RNA expression53 of CognitionBrain aging proteins. Mean normalized expression values shown. Top model proteins and proteins in the GO:CC synapse pathway are highlighted. g, Changes with age and AD of top CognitionBrain proteins across tissues (plasma and brain) and molecular layers (protein, bulk RNAand single-cell RNA). Changes in plasma were assessed using linear models from the Stanford- and Knight- ADRC cohorts (n = 3,226 individuals). Statistics for brain tissue were pulled from refs. 39,53. Proteins with significant changes across tissues shown. Asterisks represent FDR-adjusted P value thresholds: *q < 0.05; **q < 0.01; ***q < 0.001. All error bars represent 95% confidence intervals. NS, not significant.
We used this information to train a second-generation brain aging model, which we term the CognitionBrain aging model, by only using CDRGLOB FIBA+ brain-specific proteins (Fig. 3b and Supplementary Tables 16–19). This method is similar to second-generation methylation aging clocks which are trained jointly on chronological age and aging phenotypes14. We found that the CognitionBrain age gap had a stronger association with AD than the first-generation brain age gap and the conventional age gap in the Knight-ADRC cohort (Extended Data Fig. 6b). This result replicated in the independent test cohort Stanford-ADRC. In a meta-analysis, individuals with AD had approximately two years of additional CognitionBrain aging (P valuemeta = 9.23 × 10−36) compared to individuals without AD (Fig. 3c and Supplementary Table 20). The CognitionBrain age gap was also significantly associated with risk of future dementia progression in both ADRC cohorts. A standard deviation increase in the CognitionBrain age gap conferred a 34% increased risk (P valuemeta = 1.03 × 10−15) of a clinically relevant two-point increase in the Clinical Dementia Rating Sum-of-Boxes score (CDR-SB) within five years (Supplementary Table 21). We also tested associations between CognitionBrain age gap and changes in brain volume using matched volumetric MRI in the Stanford-ADRC and Stanford Aging and Memory Study (SAMS) cohorts (Extended Data Fig. 6c, Supplementary Table 22, Supplementary Fig. 7 and Supplementary Information), and found CognitionBrain age gap significantly predicted brain volume in multiple AD-sensitive regions.
Given its associations with AD status, cognitive decline risk and brain volume, we asked whether the CognitionBrain aging model could be used in combination with other biomarkers of AD and predictors of cognitive decline, including plasma pTau-181 (ref. 5) and an AD polygenic risk score30, to better stratify AD patients for future clinical outcomes. We tested a multivariate dementia progression cox proportional hazard model with baseline CDRGLOB, age, CognitionBrain age gap, plasma pTau-181 and an AD polygenic risk score (Fig. 3d) in the Stanford-ADRC. We found that the CognitionBrain age gap had the highest adjusted hazard ratio (hazard ratio = 1.57; P = 8.95 × 10−3) of the AD biomarkers, and that both plasma pTau-181 and CognitionBrain age gap were additive for risk prediction (estimated combined hazard ratio = 2.08, Fig. 3e). Individuals with fluid biomarker levels two standard deviations above average had a 75% probability of dementia progression, while individuals with levels two standard deviations below average had under a 10% probability of dementia progression within five years. Pairwise correlation between all biomarkers also showed that the CognitionBrain age gap was largely independent from other biomarkers (Extended Data Fig. 6d). Taken together, these data suggest CognitionBrain age gap provides molecular information about brain aging not captured by other approaches.
Given the significant associations between the CognitionBrain age model and several brain aging metrics, we sought to uncover new insights into brain aging mechanisms by examining the proteins that make up the model. A total of 47 of the 49 model proteins were detectable in human brain single-cell RNA sequencing (scRNA-seq) data and most could be mapped to neurons and glia with high specificity (Fig. 3f). Proteins with the largest positive weights in the model (Fig. 3c) included the synaptic proteins complexin 1 (CPLX1), complexin 2 (CPLX2) and neurexin 3 (NRXN3)—which all have genetic links to cognition and AD31,32,33—and stathmin 2 (STMN2) and olfactomedin 1 (OLFM1)—which are involved in neurite outgrowth and axon growth cone collapse34,35. Proteins with large negative weights in the model such as Aldolase Fructose-Bisphosphate C (ALDOC), neuronal pentraxin receptor (NPTXR), carnosine dipeptidase 1 (CNDP1) and Lanc Like Glutathione S-Transferase 1 (LANCL1). ALDOC, NPTXR and CNDP1 are expressed in astrocytes, neurons and oligodendrocytes, respectively (Fig. 3f) and have been proposed as CSF biomarkers for AD36,37. LANCL1, which is primarily expressed in oligodendrocytes (Fig. 3f), has been shown to be crucial for neuronal health in mouse models38. The model also implicated alterations in the glycosylated extracellular matrix through the proteins tenascin R (TNR), neurocan (NCAN) and heparan sulfate-glucosamine 3-sulfotransferase 4 (HS3ST4), underlining the role of the extracellular matrix in brain aging.
We assessed the highest weighted CognitionBrain proteins for their changes with age and AD in the Knight-ADRC and Stanford-ADRC cohorts, as well as their changes with AD in brain tissue at the protein39, bulk RNA39 and single-cell RNA levels from publicly available datasets (Fig. 3g). We observed a consistent pattern of decreases in AD brain tissue and increases in the blood with age and AD. This suggests that the increase of synapse and neurite growth related protein levels in the blood could reflect a loss or alteration in protein processing and subsequent shedding of these crucial factors in the brain. A similar inverse relationship between fluid and brain protein levels is seen with amyloid beta, whereby lower CSF AB42 is correlated with increased AB plaques in the brain40.
Organ aging in cognitive decline and AD
We next sought to apply the FIBA optimization framework to other organ aging models to understand how the aging of other organs contributes to brain aging phenotypes (Fig. 4a). As with the brain aging model, we applied CDRGLOB FIBA to all aging models using the Knight-ADRC (Extended Data Figs. 7 and 8). The CognitionArtery, CognitionBrain, CognitionOrganismal and CognitionPancreas age gap associations with AD replicated in both ADRCs (Fig. 4b and Extended Data Fig. 8c,d), so we focused on these four aging models to understand peripheral versus central contributions to cognitive decline.
Fig. 4: Organ aging in cognitive decline and AD.
a, CDRGLOB FIBA was applied to all organ aging models using the Knight-ADRC (K-ADRC) to understand body-wide contributions to brain aging phenotypes (Supplementary Table 15). b, Associations (linear regression) between AD and the CognitionArtery (P valuemeta = 6.02 × 10−16), CognitionBrain (P valuemeta = 9.23 × 10−36), CognitionOrganismal (P valuemeta = 2.03 × 10−28) and CognitionPancreas (P valuemeta = 1.11 × 10−21), age gaps replicated in the Stanford-ADRC (S-ADRC) (Supplementary Table 20). c, Associations (linear regression) between organ age gaps and a composite score of overall cognition in the LonGenity cohort (n = 888). P valueCognitionOrganismal = 9.58 × 10−8, P valueCognitionBrain = 4.24 × 10−7, P valueCognitionArtery = 2.46 × 10−3 and P valueCognitionPancreas = 4.8 × 10−3 (Supplementary Table 23). d, Cox proportional hazard regression analysis, organ age gap and risk of conversion from cognitively normal to cognitive impairment (CDR-Global 0 → > = 0.5) over 15 years follow-up in the Knight-ADRC (n = 226 events in 940 individuals). P valueCognitionOrganismal = 0.02, P valueCognitionArtery = 0.04, P valueCognitionBrain = 0.14 and P valueCognitionPancreas = 0.26 (Supplementary Table 24). e, Aging trajectories of top ten weighted model proteins in healthy individuals (n = 3,774) across the four study cohorts. Top CognitionOrganismal proteins change with age earliest and at the highest rate. f, Changes with age of top cognition-optimized aging model proteins in healthy individuals (n = 3,774) across the four study cohorts. Age effect and negative log10 FDR-corrected P values from a linear model are shown. Size of bubbles is scaled by the absolute value of the average model weight (Supplementary Table 25). g, Left, human brain vasculature single-cell RNA expression42 of top five CognitionOrganismal aging proteins. Mean normalized expression values and fraction of cells expressing the genes are shown. Right, pericytes, smooth muscle cells (SMC) and fibroblasts are lost in AD. Asterisks represent P value thresholds from a two-tailed t-test: *P < 0.05; **P < 0.01. h, Model of age-related cellular degradation of the human brain vasculature reflected in the plasma proteome. i, StringDB protein–protein interaction network of CognitionArtery and interacting proteins (score ≥ 0.4), and related pathway enrichments (percent overlap between proteins and pathway gene sets). j, Model of age-related vascular calcification and extracellular matrix alterations reflected in the plasma proteome. All error bars represent 95% confidence intervals.
To understand the full temporal sequence of cognitive decline, we tested if age gaps were associated with cognition in cognitively normal individuals using a composite score of overall cognition in the LonGenity cohort. The decreased cognitive function was significantly associated with all four age gaps (Fig. 4c, Extended Data Fig. 9a and Supplementary Table 23). We replicated these associations in the healthy SAMS cohort, where we observed that individuals with worse memory recall had higher CognitionOrganismal and CognitionBrain age gaps (Extended Data Fig. 9b and Supplementary Table 23).
We next tested associations between age gaps and risk of transition from cognitively normal to mild cognitive impairment (MCI) (CDR-Global Score 0 to greater than or equal to 0.5) using 15 years of clinical cognitive assessment in the Knight-ADRC (Fig. 4d and Supplementary Table 24). We found that the CognitionOrganismal (hazard ratio = 1.17, P = 0.02) and CognitionArtery (hazard ratio = 1.15, P = 0.04) age gaps significantly predicted conversion to MCI, while the CognitionBrain (hazard ratio = 1.11, P = 0.14) trended towards significance (Fig. 4d). The prediction of future conversion to MCI over 15 years is unlikely to be explained by undiagnosed cognitive impairment, placing changes detected by these aging models early in the causal chain of cognitive decline and neurodegenerative disease.
To understand the biological processes and proteins involved in early cognitive decline, we plotted the aging trajectory of all model proteins and found that highly weighted CognitionOrganismal and CognitionArtery proteins changed with age earlier and at a faster rate than CognitionBrain and CognitionPancreas proteins (Fig. 4e). The earliest changes occurred in a highly correlated cluster of CognitionOrganismal proteins: pleiotrophin (PTN), transgelin (TAGLN), WNT1 Inducible Signalling Pathway Protein 2 (WISP2), CUB Domain Containing Protein 1 (CDCP1) and chordin like 1 (CHRDL1; Fig. 4f). Though not organ-specific, these genes were all highly expressed in the arteries and brain (Extended Data Fig. 10a). Single-cell expression of these genes in human vasculature41,42, indicated these genes are expressed primarily by smooth muscle cells, pericytes and fibroblasts (Fig. 4g and Extended Data Fig. 10b). Loss of brain pericytes, smooth muscle cells and perivascular fibroblasts is associated with age and AD42,43 (Fig. 4g), and pericyte-specific deletion of PTN renders neurons prone to ischaemic and excitotoxic injury44. This early changing signature in the CognitionOrganismal model may thus represent degenerative changes to the cellular integrity of the brain vasculature and the loss of its neuroprotective functions with aging (Fig. 4h).
The five proteins composing the CognitionArtery model, TNF receptor superfamily member 11b (TNFRSF11B), sclerostin (SOST), melanocortin 2 receptor accessory protein (MRAP2), frizzled related protein (FRZB) and matrix gla protein (MGP) were also primarily expressed in vascular smooth muscle cells, pericytes and fibroblasts41 (Extended Data Fig. 10c) and are all strongly implicated in vascular calcification. TNFRSF11B/APOE double knockout mice show increased calcium deposition by vascular smooth muscle cells45, MGP deficiency-causing mutations in humans leads to Keutel syndrome, a disease characterized by soft tissue calcification46, and SOST and FRZB are negative regulators of WNT signalling that drive calcification and are increased in the plasma of people with vascular calcification47,48. We found that CognitionArtery proteins and the vascular signature in the CognitionOrganismal proteins form an interaction network using StringDB (Fig. 4i). Additional model proteins in this interaction network included integrin binding sialoprotein (IBSP), osteoglycin (OGN), collagen type III alpha 1 chain (COL3A1), proline rich and gla domain 1 (PRRG1) and growth arrest specific 6 (GAS6). In total, this protein network is involved in extracellular matrix, cartilage development and osteoblast signalling pathways, and implicates vascular calcification and extracellular matrix alterations as a major component of aging that underlies the early phases of cognitive decline and neurodegenerative disease (Fig. 4i,j).
Discussion
Our study introduces a framework for modelling organ health and biological aging using plasma proteomics. The resulting organ aging models can predict mortality, organ-specific functional decline, disease risk and progression and aging heterogeneity between tissues. This approach is minimally invasive, requiring only a small blood sample, and could be easily applied to understand the effects of health interventions, such as lifestyle modifications and drug therapies, at the organ level. We provide a large and comprehensive resource of organ aging information in nearly 6,000 individuals spanning the adult lifespan and multiple age-related disease states, and we have developed an easy-to-use python package called organage to calculate the organ ages of any plasma proteomics sample from the SomaScan assay.
There are many future directions for this work. While we have shown that plasma proteomic organ aging models are distinct from previous proteomics models, clinical chemistry-based models and imaging-based models, future studies should assess how proteomic organ aging relates to other molecular measures of aging and disease such as methylation aging clocks and disease-specific prediction models. Although we were unable to perform direct comparisons, our models predict mortality with comparable effect sizes to models trained specifically to predict mortality and heart disease in independent cohorts49,50. We demonstrated that our approach added increased value to established biomarkers of AD, and we expect that multimodal aging and disease prediction models may have similar impacts in other diseases.
We present one of the largest studies of plasma proteome aging to date, but as larger plasma proteomics resources emerge, the power of this approach will further increase. Our current models rely on approximately 5,000 proteins measured with the SomaScan assay, but the approach is platform agnostic, and we expect that even more biological information could be gained with additional proteomic coverage, including cell and organ-specific splice isoforms and posttranslational modifications. The rapidly growing number of human gene expression maps at single-cell resolution41 will help further refine organ and cell-type specific aging models and allow for a comprehensive understanding of organismal physiology based on the plasma proteome.
Another question for future studies is which organ-specific aging proteins are causal drivers of aging, given that multiple plasma proteins have been shown to directly modulate aging phenotypes8. Of note, many of the proteins with large weights in the models, such as KLOTHO, UMOD, MYL7, CPLX1, CPLX2 and NRXN3, have genetic associations with diseases of their respective organs or are validated therapeutic targets, suggesting a potential causal role of these proteins in organ aging. Future genomic studies should further investigate the genetic architecture of organ aging clocks and their relationships to disease using GWAS and post-GWAS methods such as colocalization and Mendelian randomization.
This study has multiple limitations. First, we have limited the study to a subset of organs to avoid over-interpretation of models for which we lacked convincing organ-relevant aging phenotypes. It remains unclear if this approach will generalize to all organs in the body, such as reproductive organs, and future studies should address this question. Second, we observe many instances of nonlinear dynamics in the plasma proteome and in aging phenotypes. While our current models serve as a proof of principle for this approach, since they are trained and evaluated largely on older adults, caution should be used when applying them to young people. More sophisticated nonlinear machine learning methods such as neural networks or random forests may further improve the accuracy and generalizability of this approach in the future. Lastly, the models were trained and tested on American and Caucasian-skewed cohorts, and future studies should assess the generalizability of the findings in more ethnically and geographically diverse populations.
Altogether, we show that large-scale plasma proteomics and machine learning can be leveraged to noninvasively measure organ health and aging in living people. We show that biologically motivated modelling, in which we use sets of organ-specific proteins and the FIBA algorithm to further subset to physiological age-related proteins, enables deconvolution of the different rates of aging within an individual and measurement of aging at organ-level resolution.
Methods
Human cohortsCovance
Details of the Covance study have been previously published54. Briefly, Covance is a multi-site cross-sectional study of health across the lifespan collected at five hospital sites in the United States in 2008. A total of 1,028 subjects were included in analyses for this study. Cohort demographic characteristics are summarized in Supplementary Table 1. Exclusion criteria for the study included uncontrolled hypertension, self-reported treatment for a malignancy other than squamous cell or basal cell carcinoma of the skin in the last two years, self-reported pregnancy, self-reported chronic infection, autoimmune condition or other inflammatory condition, self-reported chronic kidney or liver disease, chronic heart failure or diagnosed with myocardial infarction in the last three months, self-reported diabetes (HbA1c > 8% if known), self-reported acute bacterial or viral infection in the past 24 h or a temperature greater than 38 °C within 24 h of enrolment, self-reported participation in any therapeutic study within 14 days before blood sampling and taking more than 20 mg of prednisone or related drugs.
Clinical blood chemistry was performed on the same samples, including a complete blood count and comprehensive metabolic panel, lipid panel and liver function tests. Basic physical workup (blood pressure, pulse and respirations) was also collected. Lifestyle information was also collected from all participants using a survey which asked about smoking, alcohol, exercise, habits and frequency of consumption of different meats and vegetables.
LonGenity
Details of the LonGenity cohort have been previously published55,56. Briefly, LonGenity is an ongoing longitudinal study initiated in 2008 and designed to identify biological factors that contribute to healthy aging. The LonGenity study enrols older adults of Ashkenazi Jewish descent with an age range of 65–94 years at a baseline. Approximately half of the cohort consists of offspring of parents with exceptional longevity, defined as having at least one parent who survived to 95 years of age. The other half of the cohort includes offspring of parents with usual survival, defined as not having a parental history of exceptional longevity. A total of 962 subjects were included in analyses for this study. The cohort characteristics are summarized in Supplementary Table 1. LonGenity participants are thoroughly characterized demographically and phenotypically at annual visits that include collection of medical history and physical and detailed neurocognitive assessments (described in detail below). The LonGenity study was approved by the institutional review board (IRB) at the Albert Einstein College of Medicine.
Subjects in the LonGenity cohort underwent extensive cognitive examination. The Overall Cognition Composite score was determined by the relative performance of the subject in the Free and Cued Selective Reminding Test, WMS-R Logical Memory I, RBANS Figure Copy, RBANS Figure Recall, WAIS-III Digit Span, WAIS-III Digit Symbol Coding, Phonemic Fluency (FAS), Categorical Fluency, Trail Making Test A and Trail Making Test B. For each task a standardized score (z) was calculated based on the population. The z-score for each task was then combined to create the overall cognition composite.
Stanford Alzheimer’s Disease Research Center
Samples were acquired through the National Institute on Aging (NIA)-funded Stanford Alzheimer’s Disease Research Center (Stanford-ADRC). The Stanford-ADRC cohort is a longitudinal observational study of clinical dementia subjects and age-sex-matched nondemented subjects. The collection of plasma was approved by the Institutional Review Board of Stanford University and written consent was obtained from all subjects. Blood collection and processing were done according to a rigorous standardized protocol to minimize variation associated with blood draw and blood processing. Briefly, about 10 cc of whole blood was collected in a vacutainer ethylenediaminetetraacetic acid (EDTA) tube (Becton Dickinson vacutainer EDTA tube) and spun at 3,000 RPM for 10 mins to separate out plasma, leaving 1 cm of plasma above the buffy coat and taking care not to disturb the buffy coat to circumvent cell contamination. Plasma processing times averaged approximately one hour from the time of the blood draw to the time of freezing and storage. All blood draws were done in the morning to minimize the impact of circadian rhythm on protein concentrations. Plasma pTau-181 levels were measured using the fully automated Lumipulse G 1200 platform (Fujirebio US, Inc, Malvern, PA) by experimenters blind to diagnostic information, as previously described57.
All healthy control participants were deemed cognitively unimpaired during a clinical consensus conference that included board-certified neurologists and neuropsychologists. Cognitively impaired subjects underwent Clinical Dementia Rating and standardized neurological and neuropsychological assessments to determine cognitive and diagnostic status, including procedures of the National Alzheimer’s Coordinating Center (https://naccdata.org/). Cognitive status was determined in a clinical consensus conference that included neurologists and neuropsychologists. All participants were free from acute infectious diseases and in good physical condition. A total of 409 subjects were included in analyses for this study. Cohort demographics and clinical diagnostic categories are summarized in Supplementary Table 1.
Stanford Aging Memory Study
SAMS is an ongoing longitudinal study of healthy aging. Blood collection and processing were done by the same team and using the same protocol as in Stanford-ADRC. Neurological and neuropsychological assessments were performed by the same team and using the same protocol as in Stanford-ADRC. All SAMS participants had CDR = 0 and a neuropsychological test score within the normal range; all SAMS participants were deemed cognitively unimpaired during a clinical consensus conference that included neurologists and neuropsychologists. A total of 192 cognitively SAMS participants were included in the present study. The collection of plasma was approved by the Institutional Review Board of Stanford University and written consent was obtained from all subjects. Cohort demographics and clinical diagnostic categories are summarized in Supplementary Table 1.
Knight Alzheimer’s Disease Research Center
The Knight-ADRC cohort is an NIA-funded longitudinal observational study of clinical dementia subjects and age-matched controls. Research participants at the Knight-ADRC undergo longitudinal cognitive, neuropsychologic, imaging and biomarker assessments including Clinical Dementia Rating (CDR). Among individuals with CSF and plasma data, AD cases corresponded to those with a diagnosis of dementia of the Alzheimer’s type (DAT) using criteria equivalent to the National Institute of Neurological and Communication Disorders and Stroke-Alzheimer’s Disease and Related Disorders Association for probable AD58, and AD severity was determined using the Clinical Dementia Rating (CDR)59 at the time of lumbar puncture (for CSF samples) or blood draw (for plasma samples). Controls received the same assessment as the cases but were nondemented (CDR = 0). Blood samples were collected in EDTA tubes (Becton Dickinson vacutainer purple top) at the visit time, immediately centrifuged at 1,500g for 10 min, aliquoted on two-dimensional barcoded Micronic tubes (200 ul per aliquot) and stored at −80 °C. The plasma was stored in monitored −80 °C freezer until it was pulled and sent to Somalogic for data generation. The Institutional Review Board of Washington University School of Medicine in St. Louis approved the study and research was performed in accordance with the approved protocols. A total of 3,075 participants were included in the present study. Cohort demographics and clinical diagnostic categories are summarized in Supplementary Table 1.
Proteomics data acquisition and quality controlSomaScan assay
We used the SomaLogic SomaScan assay, which uses slow off-rate modified DNA aptamers (SOMAmers) to bind target proteins with high specificity, to quantify the relative concentration of thousands of human proteins in plasma. The assay has been used in hundreds of studies and described in detail previously54,60. Two versions of the SomaScan assay were used in this study. The v.4 assay (4,979 protein targets) was applied to the Covance and LonGenity cohorts, and the v.4.1 assay (7,288 protein targets) was applied to the SAMS, Stanford-ADRC and Knight-ADRC cohorts. All v.4 targets are included in the v.4.1 assay based on SeqId, and only the v.4 targets were analysed for this study.
Somalogic normalization and quality control
Standard Somalogic normalization, calibration and quality control were performed on all samples54,61,62,63. Briefly, pooled reference standards and buffer standards are included on each plate to control for batch effects during assay quantification. Samples are normalized within and across plates using median signal intensities in reference standards to control for both within-plate and across-plate technical variation. Samples are further normalized to a pooled reference using an adaptive maximum likelihood procedure. Samples are additionally flagged by SomaLogic if signal intensities deviated significantly from the expected range and these samples were excluded from analysis. The resulting expression values are the provided data from Somalogic and are considered ‘raw’ data.
The v.4 → v.4.1 multiplication scaling factors provided by Somalogic were applied to the raw v.4 assay expression values to allow for direct comparisons across two v.4 and three v.4.1 cohorts. We discarded proteins for which the correlation was low between assay versions v.4 and v.4.1 and low estimated replicate coefficient of variation64 (Supplementary Fig. 1). This resulted in 4,778 proteins for downstream analysis. The raw data were log10 transformed before analysis, as the assay has an expected log-normal distribution.
Somalogic probe validation
Somalogic has analysed close to 1 million samples with their technology at the time of this publication, resulting in some 700 publications (https://somalogic.com/publications/). There is minimal replicate sample variability64,65 (coefficient of variation, CV). The majority of SomaScan protein measurements are stable and a subset of proteins have been validated as laboratory-developed tests (LDTs), and have been delivered out of Somalogic’s CLIA-certified laboratory to physicians and patients in the context of medical management66.
Identification of organ-enriched plasma proteins
We used the Gene Tissue Expression Atlas (GTEx) human tissue bulk RNA-seq database18 to identify organ-enriched genes and plasma proteins (Extended Data Fig. 1). Tissue gene expression data were normalized using the DESeq2 (ref. 67) R package. We define organ-enriched genes in accordance with the definition proposed by the Human Protein Atlas19: a gene is enriched if it is expressed at least four times higher in a single organ compared to any other organ. Within GTEx, we grouped tissues of the same organ together, such that a gene’s expression level for a given organ was the maximum gene expression value among its subtissues. For example, all GTEx brain regions were considered subtissues of the brain organ. We define the immune organ, which is not a GTEx tissue, as expression in the blood and the spleen tissues. Organ-enriched genes were mapped to the 4,979 plasma proteins quantified in the v.4 SomaScan assay.
Bootstrap aggregated LASSO aging models
To estimate biological age using the plasma proteome, we built LASSO regression-based chronological age predictors (Extended Data Figs. 2–3 and Supplementary Fig. 3) using the scikit-learn68 python package. We employed bootstrap aggregation for model training. Briefly, we resampled with replacement to generate 500 bootstrap samples of our training data (Knight-ADRC: 1,398 healthy individuals). Each bootstrap sample was the same size as the training data, 1,398. For each bootstrap sample, we trained a model on z-scored log10 normalized protein expression values with sex (F = 1, M = 0) as a covariate to predict chronological age. For model training, we performed hyperparameter tuning of the L1 regularization parameter, λ, with five-fold cross validation using the GridSearchCV function from scikit-learn. To reduce model complexity and avoid overfitting, we selected the highest λ value that retained 95% performance relative to the best model. The mean predicted age from all 500 bootstrap models was used.
We trained our models in 1,398 cognitively unimpaired participants from the Knight-ADRC cohort. We evaluated their performance in the Covance (n = 1,029), LonGenity (n = 962), SAMS (n = 192), Stanford-ADRC (n = 409) cohorts and Knight-ADRC cognitively impaired subjects (n = 1,677). Models that included sex as a covariate and models trained separately on males and females showed similar age prediction performance on both sexes, so we controlled for sex to extend the generality of the findings and reduce analytic complexity (Supplementary Fig. 3a–c). There was a correlation between age estimation accuracy and the number of proteins used as input to each model (Supplementary Fig. 3c,d). However, several models with few protein inputs, such as the adipose (five proteins) and heart models (ten proteins), predicted chronological age better than models with more protein inputs (Extended Data Fig. 3).
Age gap calculation and independent validation
To calculate each individual sample age gap for each aging model, we performed the following steps for each aging model. We fit a local regression between predicted and chronological age using the lowess function from the statsmodels69 python package with fraction parameter set to 2/3 to estimate the true population mean (Supplementary Fig. 3e). A local regression is used in place of a simple linear regression because of extensive evidence that the plasma proteome changes nonlinearly with age1, which we see replicated in all five cohorts (Supplementary Fig. 8). Individual sample age gaps were then calculated as the difference between predicted age and the lowess regression estimate of the population mean. Age gaps were calculated separately per cohort to account for cohort differences (Supplementary Fig. 3e). Age gaps were z-scored per aging model to account for the differences in model variability (Supplementary Fig. 3f). This allowed for direct comparison between organ age gaps in downstream analyses.
Phenotypic age calculation
We used the published coefficients14 to calculate the phenotypic age of participants in the Covance cohort using albumin, creatinine, glucose, c-reactive protein, % lymphocyte, mean cell volume, red cell distribution width, alkaline phosphatase, white blood cell count and age.
Statistical methods to associate organ age gaps with age-related phenotypesStudy design
A flowchart of the study design is provided in Supplementary Fig. 2. Each box in the flowchart was treated as a separate analysis for the purpose of multiple testing correction. Multiple testing correction was done using the Benjamani–Hochberg method and the significance threshold was a 5% false discovery rate. To summarize the flowchart, the age gaps from all 11 organ aging models, the organismal model and the conventional model were used in the following analyses: prediction of future mortality in the LonGenity cohort with a cox proportional hazards model (CPH) (12 of 13 tests significant after FDR), prediction of future heart disease in the LonGenity cohort with a CPH (12 of 13 tests significant after FDR), association with nine diseases of aging in a cross-cohort meta-analysis (66 of 17 tests significant after FDR) and association with 42 clinical biochemistry markers in the Covance cohort (237 of 588 tests significant after FDR, PhenoAge gap also tested for 14 × 42 tests).
The 12 cognition-optimized models (11 organs + organismal model) were tested on additional brain aging phenotypes. The CognitionBrain age gap only was tested for association with 65 MRI brain volumes and an MRI-based brain age gap (40 of 66 tests significant after FDR). The CognitionBrain age gap only was included in a multivariate CPH model of dementia progression in AD (1 of 1 tests significant, no FDR). The 12 cognition-optimized model age gaps were tested for association with AD status in the Knight-ADRC (12 of 12 tests significant after FDR), then a replication analysis was performed in Stanford-ADRC (4 of 12 tests significant at P < 0.05, no FDR). The four models which replicated CognitionBrain, CognitionOrganismal, CognitionArtery and CognitionPancreas were then tested for associations with overall cognition in healthy elderly people (LonGenity, 4 of 4 tests significant and no FDR), memory function in the Stanford-ADRC (2 of 4 tests significant, no FDR) and 15-year prediction of conversion from normal cognition to mild cognitive impairment in the Knight-ADRC with a CPH model (2 of 4 tests significant, no FDR).
Linear modelling
Estimation of chronological age is not sufficient in determining whether an organ aging model measures the age-related physiological dysfunction of an organ. To determine whether estimated organ age contains physiologically relevant information, we associated organ age gaps with various age-related phenotypes across Covance, LonGenity, SAMS, Stanford-ADRC and Knight-ADRC cohorts. Most organ age gap versus trait associations in this study (Figs. 2a–d and 3c and Extended Data Figs. 4d,e, 5c, 6b,c,7, 8c,d and 9) were assessed using linear models controlled for age and sex as follows: age gap ≈ trait + age + sex and adjusted for multiple testing burden using the Benjamini–Hochberg method when appropriate. To describe disease associations in relation to years of additional aging in the main text, we took the coefficient for the trait variable—which provides an estimate of the mean difference in z-scored age gaps between disease and control—and converted that to an estimate of mean difference in raw age gaps, using the standard deviation of raw age gaps provided in Supplementary Table 8.
Meta-analyses
Meta-analyses to compare and aggregate effect sizes and confidence intervals from multiple cohorts were performed in R using the metafor70 package with an inverse variance weighted fixed effects model.
Cox proportional hazard modelling
Cox proportional hazards models were used to assess the association between organ age gaps and future risk of mortality, congestive heart failure and increase in clinical dementia rating using the following model: event risk ≈ organ age gap + age + sex. Models were tested using the lifelines71 python package. Kaplan Meyer curves were generated at population-average covariate values in the relevant subject populations.
Extreme agers
Extreme agers were defined as individuals who had an age gap value two standard deviations above or below the mean (z-scored age gap greater than 2 or z-scored age gap less than −2) for at least one aging model. A total of 23% of the population across all cohorts were extreme agers. All extreme agers showed accelerated aging; no individuals displayed extreme youth signatures without extreme aging signature in a different organ (Extended Data Fig. 4a). To identify different groups of extreme agers with similar aging profiles, we performed k-means clustering (n = 13) of the extreme agers. Z-scored age gap values above 2 or below −2 were set to zero before clustering. The clusters showed distinct organ agers (Fig. 1e and Extended Data Fig. 4b). A multi-organ ager cluster was also identified. Individuals who were extreme agers in at least five different organs were manually set to multi-organ agers. Extreme ageotypes (clusters) were associated with major age-related diseases using logistic regression (trait ≈ e-ageotype) in a cross-cohort meta-analysis (Extended Data Fig. 4d and Supplementary Table 9)
Feature importance for biological aging
FIBA is an adaptation of permutation feature importance (PFI)72 (Extended Data Fig. 6a). PFI is traditionally used in machine learning to assess how much a model depends on a given feature for prediction accuracy of the target variable. The PFI score is defined as the decrease in a model’s performance when values from a single feature are randomized. In our case, for chronological age predictors, the PFI score would be calculated as the difference between the model’s original prediction accuracy (Pearson correlation between predicted and chronological age) and the model’s prediction accuracy after randomization of a single feature. The final PFI score is the mean PFI score from five randomizations.
FIBA builds on the concept of PFI and applies it to the field of aging to assess the importance of a feature in measuring biological age, instead of the target variable, chronological age. We assume that information about biological age lies in the model age gap and its association with an age-related trait. Thus, randomization of an important feature would reduce the association between the model age gap and the trait (in the expected direction). The FIBA score for a protein is calculated based on this logic and is defined as the difference between the model age gap’s original association with a trait and the association with that trait after randomization of a single feature.
We applied FIBA to understand aging model protein contributions to associations with cognition using the CDR-Global score. The mean FIBA score after five permutations was calculated for all 500 bootstraps for all organ aging models (Supplementary Table 15). A protein was defined as significant (FIBA+) if less than 5% (empirical single-tailed P < 0.05) of its FIBA scores across bootstraps was negative. Only proteins with nonzero coefficients in at least 100/500 bootstraps were considered. FIBA+ organ-specific proteins were used to train new cognition-optimized aging models from cognitively unimpaired individuals in the Knight-ADRC cohort.
Biological pathway enrichment and protein–protein interaction analysis
Biological pathway enrichment analyses were performed using g:Profiler73 with the all human genes set as the background distribution. Protein–protein interaction networks were generated using the STRING database74.
Single-cell RNA sequencing analysis
Preprocessed human heart52 and kidney51 scRNA-seq data were accessed from studies in the Human Cell Atlas. Preprocessed brain scRNA-seq data were accessed from ref. 53. Preprocessed human brain vasculature scRNA-seq data were accessed from ref. 42. Preprocessed human vasculature scRNA-seq data were accessed from Tabula Sapiens41. Gene expression counts data were log(CPM + 1) transformed and z-scored for visualization.
Brain tissue bulk proteomics and RNA sequencing
Differential expression statistics of proteins and RNA from AD versus control brains were accessed from ref. 39.
Brain MRI data from Stanford-ADRC and SAMS cohortsMRI acquisition
Whole-brain MRI scans were collected from all subjects in the Stanford-ADRC and SAMS cohorts. All MRI data was collected at the Stanford Richard M. Lucas Center for Imaging. A total of 271 subjects underwent MRI scanning on a 3 T MRI scanner (GE Discovery MR750). T1-weighted SPGR scans were collected (TR/TE/TI = 8.2/3.2/900 ms, flip angle = 9, 1 × 1 × 1 mm) and used to define grey matter volumes. A total of 134 subjects underwent MRI scanning on a hybrid PET/MRI scanner (Signa 3 tesla, GE Healthcare). T1-weighted SPGR scan were collected (TR/TE/TI = 7.7/3.1/400 ms, flip angle = 11, 1.2 × 1.1 × 1.1 mm) and used to define grey matter volumes.
Structural MRI processing
Region of interest (ROI) labelling was implemented using the FreeSurfer75 software package v.7 (http://surfer.nmr.mgh.harvard.edu). In brief, structural images were bias field corrected, intensity normalized and skull stripped using a watershed algorithm. These images underwent a white matter-based segmentation, grey/white matter and pial surfaces were defined, and topology correction was applied to these reconstructed surfaces. Subcortical and cortical ROIs spanning the entire brain were defined in each subject’s native space, using the aparc+aseg atlas in FreeSurfer.
MRI brainageR algorithm
Using matched brain MRI and plasma proteomic data from n = 541 samples in SAMS and Stanford-ADRC, we compared our plasma proteomic organ clocks with established brain MRI-based clocks, brainageR16 and BARACUS Brain-Age76.
We used a pretrained machine learning algorithm (https://github.com/james-cole/brainageR) and raw T1-weighted MRI scans to estimate brain age. This software uses SPM12 (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/) to perform tissue segmentation and normalization of individual scans to Montreal Neurological Institute (MNI) template space. The software relies on a model that used Gaussian process regression to predict brain age on 3,777 participants from seven publicly available datasets (mean age = 40.1, range = 18–90 years). It applies the results of this training to predict brain age in any new T1-w data, utilizing the RNifti (v.1.4.5) and kernlab (v.0.9-32) packages within R v.4.2.
We also used another pretrained algorithm, BARACUS (https://github.com/bids-apps/baracus, ref. 76) to estimate brain age from FreeSurfer v.5.3 processed T1-w scans. The vertex-wise cortical thickness and surface area values (transformed from subject space to fsaverage4 standard space), along with the subcortical volumetric statistics, were used as input to BARACUS’s linear support vector machine model. This model was trained on 1,166 participants with no objective cognitive impairment (566 female, mean age = 59.1, range = 20–80 years). It returns a ‘stacked-anatomy’ prediction among its results, which we used as the estimate of brain age for this method.
MRI regions of interest analysis
The volume of the AD signature region was calculated as the sum of the volumes of the parahippocampal gyrus, entorhinal cortex, inferior parietal lobules, hippocampus and precuneus. Following best practice, ROIs were linearly adjusted for estimated total intracranial volume to account for the differences in human size that is unrelated to cognitive function and neurodegeneration. Associations between organ age gaps and adjusted brain ROIs were tested using a linear model controlled for age and sex. Associations were performed for all ROIs in the aparc+aseg atlas.
Alzheimer’s disease polygenic risk score in the Stanford-ADRC cohort
AD polygenic risk scores (PRS) were calculated in the Stanford-ADRC cohort to compare to the CognitionBrain age gap. PRSs were determined from whole-genome sequencing. The Genome Analysis Toolkit workflow Germline short variant discovery was used to map genome sequencing data to the reference genome (GRCh38) and to produce high-confidence variant calls using joint-calling77. Six individuals were excluded from further whole-genome sequencing analysis due to discordance between their reported sex and genetic sex. APOE genotype (ε2/ ε3/ ε4) was determined using allelic combinations of single nucleotide variants rs7412 and rs429358. The independent loci identified in the largest AD GWAS to date were used to compute AD PRS. Namely, the 84 variants and their effect size available from Tables 1 and 2 in ref. 30 were used, in addition to rs7412 (odds ratio = 0.6) and rs429358 (odds ratio = 3.7). Plink1.9 (ref. 78) with the ‘—score’ flag was used to formally compute the PRS, while providing the individual genotypes and the list of variants with their effect size as input. Three individuals with pathogenic mutations PSEN1 or GBA were removed from this analysis.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Stanford-ADRC data are available upon reasonable request to the Stanford-ADRC data release committee, https://web.stanford.edu/group/adrc/cgi-bin/web-proj/datareq.php. All Stanford-ADRC data will be made publicly available after an embargo period at https://twc-stanford.shinyapps.io/adrc/. SAMS data are available to qualified investigators upon request to principal investigators Beth Mormino (bmormino@stanford.edu) or Anthony Wagner (awagner@stanford.edu). Knight-ADRC data were generated by the laboratory of principal investigator Carlos Cruchaga (cruchagac@wustl.edu) and are available upon reasonable request to the The National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS) (Study ID: ng00130), https://www.niagads.org/knight-adrc-collection. Data from the Covance and LonGenity cohorts can be accessed according to the policies described in the initial study publications54,55,56. Preprocessed human heart52 and kidney51 scRNA-seq data were accessed from studies in the Human Cell Atlas. Preprocessed brain scRNA-seq data were accessed from ref. 53. Preprocessed human brain vasculature scRNA-seq data were accessed from Yang et. al. 2022 (ref. 42). Preprocessed human vasculature scRNA-seq data were accessed from Tabula Sapiens41. Differential expression statistics of proteins and RNA from Alzheimer’s disease versus control brains were accessed from ref. 39. Change with age information of approximately 5,000 SomaScan v.4 plasma proteins across all five cohorts (Supplementary Fig. 8 and Supplementary Table 25) are available in a public shiny app (https://twc-stanford.shinyapps.io/aging_plasma_proteome_v2/).
Code availability
All analyses have been carried out using freely available software packages in python and R. All aging models are available and easily accessible using the organage package in Python and the associated github repository (https://github.com/hamiltonoh/organage). The package requires v.4 or higher SomaScan data, age and sex as inputs, and outputs estimated organ ages and age gaps. The aging models are available to download from the package, and the model coefficients are available in Supplementary Tables 6 and 17. Code for the FIBA algorithm are in the package’s GitHub repository.
References
|
|