Artwork

Winfried Adalbert Etzel - DAMA Norway에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Winfried Adalbert Etzel - DAMA Norway 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

2#18 - Scientific Data Management (Eng)

37:10
 
공유
 

Manage episode 366481102 series 2940030
Winfried Adalbert Etzel - DAMA Norway에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Winfried Adalbert Etzel - DAMA Norway 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

«How can we consolidate data and describe it in a standardized way?»

Scientific Data management has some unique challenges, but also provides multiple learnings for other sectors. We focused on Data Storage and Operations as a knowledge area in DMBok. A topic that is often viewed as basic, often not in focus, but is a fundamental part of data operations.

I talked to Nicolai Jørgensen at NMBU - Norwegian University of Life Sciences. Nicolai has a really diverse background. His journey in data started in 1983! In his free time, Nicolai spends time with photography and AI for text to image generation

Here are my key take aways:

Scientific Data Management

  • To describe data in a unified way, we need standards, like Dublin Core or Darwin Core for scientific data.
  • Data is an embedded part of Science and Research - you can’t have those without data.
  • You need to make sure you collect the right data, the right amount of data, valid data, +++
  • You need to optimize your amount of time, energy and expenses when collecting and validating data.
  • You need to standardize the way you collect data, to ensure that it can be verified.
  • There needs to be an audit trail (lineage) between the data you have collected and the result presented in a publication.
  • Data needs to be freely available for research and testing hypothesis.
  • Data needs to be findable, accessible and interoperable, but a also reusable.
  • ML algorithms can help extract and find changes to scientific data, that is internationally available.
  • Describing data is key to tap into knowledge - for that you need metadata.
  • In times of AI and ML, Metadata is still the key to uncover data.
  • The development of AI models is a race - maybe we need to pause and get a better picture of cause and effect, and most of all risk.

Standardizing Infrastructure

  • How can were standardize on the infrastructure for research projects
    • Minimize or get rid of volatile data storage and infrastructure
    • Standardize data storage solutions
    • Secure what needs to be secured
    • Splitt out sensitive or classified data and store separate (eg. Personal data)
    • Train your end users and educate data stewards
  • Have good guidelines for researchers on how to store, use and manipulate data.
  • There is a direct correlation between disc-space use and sustainability.
  • Storage is cheap, is a correct saying, if you look at its in isolation - but in the bigger picture the cost is just moved.
  • Just adding more storage doesn’t solve your problems, it might just yet increase them.

Long-term Preservation & Integrity

  • To preserve data for long-term you need to
    • Encapsulate data at a certain level
    • Standardize the way you describe the data
    • Upload data package to a common governed platform
    • Enclose if there is a government body that can take responsibility to preserve your data for the time necessary
    • Ensure that metadata is machine-readable
    • Formats like XML provide the possibility to read the data by both machines and humans
  • Research integrity: conducting research in a way which allows others to have trust and confidence in the methods used and the findings in that result.
  • Ensure lineage and audit trails for your scientific data.
  • Fake data, data fabrication, are serious issues in research - the understanding and methods for keeping data integrity at the highest possible level is not getting easier, but increasingly important.
  • Changes to data (change logs, change data capture, etc) can be studied as well; you can build models to build scenarios around data changes.
  • You can fetch data from other sources to enrich the quality of your data.

  continue reading

57 에피소드

Artwork
icon공유
 
Manage episode 366481102 series 2940030
Winfried Adalbert Etzel - DAMA Norway에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Winfried Adalbert Etzel - DAMA Norway 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

«How can we consolidate data and describe it in a standardized way?»

Scientific Data management has some unique challenges, but also provides multiple learnings for other sectors. We focused on Data Storage and Operations as a knowledge area in DMBok. A topic that is often viewed as basic, often not in focus, but is a fundamental part of data operations.

I talked to Nicolai Jørgensen at NMBU - Norwegian University of Life Sciences. Nicolai has a really diverse background. His journey in data started in 1983! In his free time, Nicolai spends time with photography and AI for text to image generation

Here are my key take aways:

Scientific Data Management

  • To describe data in a unified way, we need standards, like Dublin Core or Darwin Core for scientific data.
  • Data is an embedded part of Science and Research - you can’t have those without data.
  • You need to make sure you collect the right data, the right amount of data, valid data, +++
  • You need to optimize your amount of time, energy and expenses when collecting and validating data.
  • You need to standardize the way you collect data, to ensure that it can be verified.
  • There needs to be an audit trail (lineage) between the data you have collected and the result presented in a publication.
  • Data needs to be freely available for research and testing hypothesis.
  • Data needs to be findable, accessible and interoperable, but a also reusable.
  • ML algorithms can help extract and find changes to scientific data, that is internationally available.
  • Describing data is key to tap into knowledge - for that you need metadata.
  • In times of AI and ML, Metadata is still the key to uncover data.
  • The development of AI models is a race - maybe we need to pause and get a better picture of cause and effect, and most of all risk.

Standardizing Infrastructure

  • How can were standardize on the infrastructure for research projects
    • Minimize or get rid of volatile data storage and infrastructure
    • Standardize data storage solutions
    • Secure what needs to be secured
    • Splitt out sensitive or classified data and store separate (eg. Personal data)
    • Train your end users and educate data stewards
  • Have good guidelines for researchers on how to store, use and manipulate data.
  • There is a direct correlation between disc-space use and sustainability.
  • Storage is cheap, is a correct saying, if you look at its in isolation - but in the bigger picture the cost is just moved.
  • Just adding more storage doesn’t solve your problems, it might just yet increase them.

Long-term Preservation & Integrity

  • To preserve data for long-term you need to
    • Encapsulate data at a certain level
    • Standardize the way you describe the data
    • Upload data package to a common governed platform
    • Enclose if there is a government body that can take responsibility to preserve your data for the time necessary
    • Ensure that metadata is machine-readable
    • Formats like XML provide the possibility to read the data by both machines and humans
  • Research integrity: conducting research in a way which allows others to have trust and confidence in the methods used and the findings in that result.
  • Ensure lineage and audit trails for your scientific data.
  • Fake data, data fabrication, are serious issues in research - the understanding and methods for keeping data integrity at the highest possible level is not getting easier, but increasingly important.
  • Changes to data (change logs, change data capture, etc) can be studied as well; you can build models to build scenarios around data changes.
  • You can fetch data from other sources to enrich the quality of your data.

  continue reading

57 에피소드

모든 에피소드

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드