Monday 26 February 2018

Xml 스키마 버전 관리 전략


Xml 스키마 버전 관리 전략
App Store를 통해 가져 오기 우리의 응용 프로그램 에서이 게시물을 읽으십시오!
XML 스키마를 버전 관리하는 가장 좋은 방법은 무엇입니까?
나는 종종 다른 XML 기반 임포트 루틴을위한 XML 스키마를 디자인해야한다. XML 스키마는 시간이 지남에 따라 발전하거나 버그가 수정 될 수 있으므로 스키마 버전을 캡처하고 특정 버전에 바인딩 할 수있는 메커니즘을 갖는 것이 중요합니다.
현재 두 가지 시나리오가 있습니다.
버그는 스키마 내에서 발견되며 모든 스키마 인스턴스는 고정 버전을 준수해야합니다.
스키마가 업그레이드되어 더 바람직하다고 생각해야하지만 이전 스키마도 지원되어야합니다.
마지막으로 스키마의 네임 스페이스에 버전 정보를 저장하는 작업을했습니다.
버그를 고칠 때 같은 네임 스페이스에서 수정하지만 스키마를 업그레이드하려고하면 새 네임 스페이스를 만들어야하지만 업그레이드 월이 추가됩니다.
그리고 한 달에 한 번 이상 업그레이드 한 경우 하루 만 추가하면됩니다.
더 나은 접근법을 알고 있습니까?
이것은 재미 없기 때문에 어려운 주제이며 컨설팅 지원을 제공하는 데 수년을 보냈습니다.
거기에는 많은 우수 사례가 있지만 대부분은 모든 상황에서 작동하지 않습니다. 예를 들어, 많은 사람들이 확장을 허용하기 위해 "xsd : any"를 사용하도록 권장합니다. 개발자가 스키마를 유지 관리하고이를 덤프로 만들면 재앙을 피할 수있는 방법 일뿐입니다.
시작하는 경우 다음과 같은 몇 가지 도움말을 참조하십시오.
부 버전 번호, 마이크로 버전 번호, 날짜 또는 그 밖의 다른 것을 네임 스페이스에 넣지 마십시오. 네임 스페이스를 변경할 때마다 모든 처리 응용 프로그램이 중단됩니다. XML 인스턴스 문서에 "version"속성을 두십시오. 이를 통해 처리 응용 프로그램 또는 v 전 어댑터 서비스가 처리중인 것을 판별 할 수 있습니다. 선택적인 요소를 추가해도 송신자가 중단되지 않으며, 모르는 요소를 무시하는 정책을 사용하는 경우 수신자가 중단되지 않습니다 (JAXB 및 XMLBeans는이 방식으로 구성 될 수 있음) )

Xml 스키마 버전 관리 전략
이 토론에는 노출되어야하는 숨겨진 가정이 몇 가지 있습니다. 예를 들어, & quot; 비즈니스 & quot; 및 그것을 구현하는 기술? 저는 웹 서비스가 비즈니스가 아니라 비즈니스를 수행하는 위젯 중 하나라고 말하고 싶습니다. "비즈니스" 구현 기술과 독립적 인 추상화입니다. "비즈니스" 구현 기술과 마찬가지로 레이어가 있습니다. 변경 사항은 모든 레이어에서 발생합니다. 내가 관리하는 방식은 계층의 문화와 비즈니스를 수행하는 조직에 따라 다릅니다.
USPTO가 특허 및 상표 정보를 유포하는 경우 고려해야 할 최소한 다음과 같은 활동 계층이 있습니다.
1. 광범위한 맥락에서 우리는 산업 재산권 (IP) 산업 (특허권 및 상표권, 저작권은 아님), 정보 기술 (IT) 산업 및 정보를위한 톤과 프레임 워크를 설정하는 미국 연방 규정 보급 활동. 정책 결정은이 단계에 있습니다.
2. T는 여기에 심사관과 응용 프로그램을 안내하는 특허 및 상표 심사 절차에 대한 매뉴얼에서 철저한 세부 사항과 함께 연방 등록 고지로 확대 된 특허 및 상표를 구체적으로 다루는 연방 규정 코드의 일부입니다. 가끔 법원 결정을 추가하면 일반적으로 "비즈니스 규칙이 변경됨"수준의 양식이됩니다. 특허 데이터에 대해 발생합니다.
3. 그러나 파일 래퍼 생성, 업데이트, 폐기 및 처분을 관리하고 관리하는 하위 수준 비즈니스 절차도 있습니다. 심사관 행동; 분쟁 해결; 1 년에 출판 된 약 180,000 건의 특허 및 1 년 내에 접수 된 약 40 만 건의 신청서를 처리하는 데 필요한 수많은 세부 사항이 포함되어 있습니다. 많은 연방 규정이 여기에 적용됩니다 (개인 정보 보호, 보안, 508 준수 등). 이러한 절차는 데이터 변경에도 동기를 부여 할 수 있습니다.
(특허 사무소는 상기 사업 활동을 "소유"하고, CIO 사무국은 다음과 같은 IT 활동을 "소유"함).
4. 또한 XML 스키마가있는 가장 낮은 논리적 및 물리적 수준에서 특허 데이터를 관리하기 위해 종사하는 IT 절차 및 프로세스가 있습니다. 이것은 주로 특허 심사관 및 대중을 대상으로 검색 시스템을 구축하는 일차 사용을위한 보급 제품을 만드는 것과 관련이 있습니다. 연방 규정은 & quot; 우수 사례 & quot; 관련 IT 표준. 관련 표준에는 W3C, Oasis, ISO, WIPO (World Intellectual Property Organization) 및 일부 미국 연방 정부 기관의 성과가 포함됩니다. 내부 IT 모범 사례와 절차는 또한 제약 조건을 제공합니다. 미국 특허 문서 스키마는 WIPO 표준을 현지에서 구현 한 것입니다.
5. 보급 활동 그 자체. 이 활동은 구독을 관리하고 제한된 고객 지원을 제공하며 (예 : 버전 간 문서 변경, FAQ 유지 관리) 카탈로그를 유지 관리하고 제품 아카이브를 관리합니다. 엄밀히 말하면 & quot; 와서 & get & quot; 조작.
변화는 전적으로 IT 사업에 동기를 부여 할 수 있으며, 전적으로 IT 동기를 부여하거나, Fe deral 규정 또는 위의 일부 조합에 의해 부과 될 수 있습니다. 이러한 활동 중 일부는 데이터 또는 구문을 변경하지 않고 데이터의 의미를 변경합니다. 다른 사람들은 의미 나 데이터를 변경하지 않고 구문을 변경합니다. 다른 사람들은 의미 나 구문을 변경하지 않고 데이터를 변경합니다.
이는 스키마 변경이 앞으로 또는 이전 버전과 호환 될 수있는 지의 여부를 결정하기 때문에 관련이 있습니다. 정책을 수립하고, 법을 통과 시키며, 규칙을 작성하거나, 특허 심사를위한 규칙을 설정하는 사람들은 해당 XML 스키마에 대한 전방 호환성을 고려하지 않습니다.
그러나 XML 형식의 제작자는 XML 형식의 저작물 작성에있어 가장 중요한 규칙입니다. 그러나 Sizer는 일반적으로 IT에 대한 결과와 상관없이 비즈니스를 반영한 ​​충실한 표현 방식을 사용합니다. T는 비즈니스와 비즈니스의 변화에 ​​매우 민감하게 반응합니다. 따라서 이러한 변경 사항에 신속하게 적응하는 데 필요한 기능은 구현시 완벽하게 결합 된 위젯에 의해 달성되어야합니다. (누군가가 변경 관리 문제를 해결하기 위해 비즈니스에서 스키마를 분리하려고 할 때를 대비하여 XML에 대한 주요 판매 시점을 줄이려고합니다.)
반면에 IT 수준 또는 IP 규제 수준이 아닌 변화의 동기가 있다면 전방 또는 후방 호환성을 달성 할 수있는 더 큰 기회가 될 수 있습니다. 실제로는 이전 버전과의 호환성을 고려하고 가능한 경우 통합하는 반면, 앞으로의 호환성은 결코 바람직한 목표는 아닙니다. 이것이 Roger가 제시 한 CDC 예제와 달리 특허 중심의 데이터가 문서 중심적이라는 사실에 기인합니다. Roger는 데이터 중심적입니다.
나는 & # 8217; 지난 8 년 동안 XML 기반의 특허 출판물에 대한 변경 관리에 대한 우리의 시도를 검토 한 결과, Roger & # 8217; 모델. (우리가 경험하지 못했던 그러한 상황이있을 수 있습니다.) 우리가 성공한 것은 크게 이안 그레이엄 (Ian Graham)이 말했듯이, 변화의 영향이 데이터의 전체 수명주기에 미치는 영향을 평가 한 때문입니다. 위에서 설명한 모든 레이어의 컨텍스트 지난 주 우리는 적응할 자금이 없었던 다운 스트림 USPTO 시스템에 대한 이전의 고려되지 않은 결과와 데이터의 변경으로 인해 관련 국제 표준에서 임의로 벗어 났으므로 거의 1 년 동안 계획되었던 변경을 중단했습니다. .
나는 그의 우아한 구절로 이안에게 감사하고 싶다. & # 8220; 적절한 자격을 갖춘 바보 & # 8221; & # 8211; 훨씬 덜 경멸 스럽다.
매니저, 표준 개발부.
미국 특허 & amp; 상표 사무실.
이 메시지의 내용은 저자의 개인적인 의견이며 USPTO의 공식 성명서로 해석되어서는 안됩니다.
보낸 날짜 : 2007 년 12 월 26 일 수요일 11:09
제목 : 데이터 교환을위한 버전 전략으로 XML 스키마 역방향 또는 정방향 호환성을 사용하는 데주의하십시오.
역방향 또는 순방향 호환이 가능한 XML 스키마 디자인은 다음과 같습니다.
인기있는 데이터 버전 관리 방식.
버전 관리 전략을 통해 몇 가지주의 사항을 제기해야한다고 생각합니다.
XML 스키마에 기반하여 역방향 또는 순방향 호환성을 제공합니다.
아래에서는주의 사항을 열거합니다. 이주의 사항에 동의하십니까? 아르.
제가 놓친주의 사항이 있습니까?
웹 서비스 배포를 고려하십시오. 웹 서비스에 no가 있다고 가정합니다.
클라이언트가 누구인지 또는 클라이언트가 데이터를 사용하는 방법에 대한 지식
웹 서비스에서 검색하십시오.
웹 서비스는 XML 스키마를 사용하여 데이터의 구문을 설명합니다.
그것은 고객과 교환합니다.
웹 서비스는 다음 데이터 버전 전략을 사용합니다.
각각의 새 버전의 XML 스키마가 설계되었습니다.
따라서 오래된 XML 스키마를 가진 클라이언트는 XML 인스턴스의 유효성을 검사 할 수 있습니다.
새로운 XML 스키마를 사용하여 웹 서비스에 의해 생성 된 문서.
설명 된 시나리오를 감안할 때 사용시주의해야 할 사항은 무엇입니까?
포워드 호환 XML 스키마를 데이터의 버전 관리 전략으로 사용합니다.
이전 버전과의 호환성을 기반으로하는 버전 관리 전략은 동일합니다.
주의 사항. 내가 명시 적으로 하위 호환성을 언급하지 않습니다.
이 메시지의 나머지 부분은 주석이 적용된다는 것을 명심하십시오.
주의 # 1 : 클라이언트가 데이터를 확인할 수 있기 때문에.
그것이 데이터를 처리 할 수 ​​없다는 것을 의미하지는 않습니다.
버전 1 데이터를 처리하도록 구현 된 클라이언트 응용 프로그램을 고려하십시오.
웹 서비스에서.
웹 서비스가 앞으로 호환되는 XML 스키마를 변경한다고 가정합니다.
유행. 클라이언트 응용 프로그램이 새 응용 프로그램을 처리 할 수 ​​있습니까?
XML 스키마는 포워드 호환이 가능하기 때문에 애플리케이션이 가능할 것입니다.
"확인" 새로운 데이터.
그러나 응용 프로그램이 가능할 수있는 것은 아닙니다.
"프로세스" 새로운 데이터.
예 1 : XML 스키마의 첫 번째 버전에서이를 가정합니다.
"마을 중심으로부터의 거리"를 의미한다. 따라서 클라이언트의.
응용 프로그램은 그 의미에 따라 계산을 수행합니다.
버전 2 데이터에서 구문은 포워드 호환으로 변경됩니다.
유행. 또한, & lt; distance & gt; 요소가 있습니다.
'타운 라인으로부터의 거리'로 변경되었습니다.
클라이언트 응용 프로그램은 버전 2 데이터의 유효성을 검사 할 수 있지만
계산이 올바르지 않습니다.
예 # 2 : 버전 1 XML 스키마의 기본값이 & lt; distance & gt; 단위.
버전 2 XML 스키마는 & lt; distance & gt; ~까지 단위.
데이터가 유효하지만 클라이언트의 응용 프로그램은 유효성을 검사합니다.
잘못된 계산을하게됩니다.
학습 교훈 # 1 : 데이터가 구문에 따라 변경 될 수 있습니다.
유효성 검사는 영향을받지 않지만 응용 프로그램은 중단됩니다.
Lesson Learned # 2 : 응용 프로그램이 데이터의 유효성을 검사 할 수 있기 때문입니다.
그것이 데이터를 처리 할 수 ​​있다는 것을 의미하지는 않습니다.
Lesson Learned # 3 : 포워드 호환 XML 스키마 수율이 향상되었습니다.
유효성 검사가 필요하지만 응용 프로그램 처리가 반드시 증가하지는 않습니다.
배운 교훈 # 4 :.
데이터 유효성 검사 기능 및 데이터 처리 능력
학습 교훈 # 5 : 버전 관리 전략은 다음을 고려해야합니다.
1. 통사론의 변화.
2. 관계 변화.
3. 의미 론적 변화.
주의 # 2 : 순방향 호환성은 기술을 기반으로합니다.
응용 프로그램 요구 사항보다 제한이 있습니다.
XML 스키마의 새 버전을 다음 버전과 호환되도록 설계하십시오.
이전 버전에서는 새 버전에서 변경된 사항 만 필요합니다.
버전은 "서브 세트"이다. 변경 사항 :
- 요소 또는 속성의 데이터 유형을 제한합니다.
- 요소의 발생 횟수를 줄입니다.
- 선택적 요소 또는 속성을 제거합니다.
- 선택 항목에서 요소를 제거합니다.
이것은 매우 제한적입니다. 그리고 무슨 소용입니까? 답변 : 활성화하십시오.
이전 XML 스키마에 대한 새로운 XML 인스턴스 문서의 유효성 검사.
그러나 위에서 설명한 것처럼 데이터가 유효성을 검사 할 수 있다고해서 그것이 의미하는 것은 아닙니다.
그것은 처리 될 수 있습니다.
또한, 우리가 고려한 시나리오에 대해 웹 서비스가 있습니다.
해당 데이터가 클라이언트에 의해 처리되는 방법에 대해서는 알지 못합니다. 따라서,
에 의해 제공되는 추가 유효성 확인에 대한 증거는 없습니다.
순방향 호환 XML 스키마는 클라이언트를 도울 것입니다.
Lesson Learned # 6 : 포워드 호환을 기반으로 한 버전 관리 전략.
XML 스키마는 변경 유형에 제한을 부과합니다. 그.
한계는에 의해 요구 된 실제 변화와 일치하지 않을 수 있습니다.
배운 교훈 # 7 : 데이터 요구 사항에 기반한 버전 데이터.
1. 위에 열거 된주의 사항에 동의하십니까?
2. 다른주의 사항이 있습니까?
3. 배운 교훈에 동의하십니까?
위에서 설명한 시나리오가 주어지면 버전 관리를 기반으로하는 것이 현명한 방법입니다.

[편집자 초안] XML 언어 버전 관리.
제안 된 태그 검색 2003 년 10 월 3 일.
이 발견은 XML 언어의 확장 성 및 버전 관리와 관련된 여러 가지 문제점을 논의합니다. 특히 여러 종류의 언어에 대한 확장 성 및 버전 관리를 검토하고 역방향 및 전달 호환 방식으로 변경 될 수있는 언어를 구축하기위한 몇 가지 전략을 간략하게 설명합니다. 언어 구성 및 확장에 XML, XML 네임 스페이스 및 W3C XML 스키마를 사용하기위한 여러 가지 제약 조건과 우수 사례에 대해 설명합니다.
느슨하게 결합되거나 개방 된 시스템에서 언어의 발전에 특히주의를 기울입니다. 이러한 아이디어는 파생, 확장 성 포인트 및 버전 번호와 같은 다른 버전 관리 메커니즘과 대조됩니다.
이 발견은 크게 두 부분으로 구분됩니다. 첫 번째 절에서는 확장 성 및 버전 문제의 크기와 범위를 확인하는 데 중점을 둡니다. 이 섹션은 DTD, RELAX NG 및 XML 스키마와 같은 모든 스키마 언어에도 동일하게 적용됩니다. 두 번째 절에서는 W3C XML 스키마에서 설명 된 제약 조건과 사례를 달성하는 데 사용할 수있는 구체적인 단계에 중점을 둡니다.
이 문서의 상태.
이 문서는 W3C 기술 아키텍처 그룹의 토론을 위해 개발되었습니다. 아직 TAG의 의견 일치를 나타내지는 않습니다.
이 발견 결과의 발표가 W3C 회원들의 보증을 의미하지는 않습니다. 이것은 초안 문서이며 언제든지 다른 문서로 업데이트, 교체 또는 폐기 될 수 있습니다.
승인 된 상태와 초안 상태에서의 추가 TAG 조사 결과도 제공 될 수 있습니다. TAG는 W3C 권장 추적 과정에 따라 게시 될 웹 아키텍처 문서에이 내용과 다른 결과를 통합 할 것을 기대합니다.
이 발견에 대한 의견을 공개적으로 보관 된 TAG 메일 링리스트 www-tagw3.org (아카이브)에 보내주십시오.
목차.
부록.
1. 소개.
XML은 태그 세트, 요소 및 속성의 언어를 만들기 위해 설계되었습니다. XML은 모든 XML 구문 분석기가 문서의 네임 스페이스로 된 요소와 속성, 속성 값 및 텍스트 내용을 인식 할 수 있다는 것을 최소한으로 설명합니다. 인스턴스 문서에서 언어 조합을 위해 설계되었습니다.
대부분의 XML 언어는 적절한 경우 다른 XML 언어를 추가 할 수 있도록 설계되어야합니다. 확장 성의 한 측면은 언어를 혼합 할 수있는 능력입니다. 가장 단순한 애플리케이션을 제외하면 언어는 시간이 지남에 따라 발전합니다. 주어진 어휘에 대해 곧 두 가지 다른 정의가 존재하기 때문에, 준비되었는지 여부에 상관없이 버전 관리가 도입되었습니다.
이 발견은 향후 개발자가 확장 성 및 변경 가능성을 고려하여 설계 할 수있는 방법에 대해 설명하고 앞으로 가능하고 앞으로 호환 가능한 변경을 가능하게합니다.
1.1 용어.
확장 성은 소프트웨어의 진화 가능성을 가능하게하는 속성입니다. 그것은 독립적 인 그리고 잠재적으로 호환 가능한 언어의 진화를 가능하게하기 때문에 아마 시스템에서 느슨한 결합에 가장 큰 기여자 일 것입니다. 언어는 [정의 : 언어의 인스턴스가 다른 어휘의 용어를 포함 할 수있는 경우 확장 가능]으로 정의됩니다.
XML 네임 스페이스는 언어 또는 언어에서 함께 사용하기위한 용어를 수집하기위한 편리한 컨테이너입니다. 세계적으로 고유 한 이름을 생성하는 메커니즘을 제공합니다.
언어에는 하나 이상의 XML 네임 스페이스 (또는 없음)에서 가져올 수있는 어휘가 있습니다. 어휘는 용어 집합입니다.] 언어의 구문 구조는 DTD, XML 스키마, 다른 스키마 언어 또는 관련 언어 사양으로 표현 된 내러티브 제약 조건의 사용으로 인해 제한됩니다.
일반적으로 어휘 용어의 의도 된 의미는 용어가있는 언어에 의해 범위가 정해집니다. 그러나 XML 네임 스페이스에서 가져온 용어는 사용되는 모든 언어에서 일관된 의미를 갖습니다.
우리의 목적을 위해, [정의 : 언어는 구속 조건을 정의한 식별 어휘 집합입니다] 예를 들어, XHTML 1.0의 요소와 속성 또는 XPath 2.0의 내장 함수 이름. 언어는 특정 스키마 언어의 스키마에 의해 정의되거나 정의되지 않을 수 있습니다. 언어로는 특정 응용 프로그램에서 사용되는 요소 및 속성 또는 구성 요소 집합을 의미합니다.
인스턴스 란 언어의 실현이다. 문서는 언어의 인스턴스입니다. XML에 루트 요소가 있어야합니다.
[내용 : 내용은 언어의 인스턴스의 일부인 데이터입니다.] 내용은 문서의 일부입니다. 콘텐츠에 하나 이상의 구성 요소가 있습니다.
[정의 : 컴포넌트는 언어로 된 용어의 실현이다.] XML 엘리먼트와 애트리뷰트는 컴포넌트이다.
이 발견에서 우리는 송신자와 수신자 측면에서 애플리케이션과 언어 간의 상호 작용을 설명합니다. 보낸 사람은 다른 응용 프로그램에서 처리 할 인스턴스를 만들거나 생성합니다. ] [정의 : 수신자는 보낸 사람으로부터 얻은 인스턴스를 소비한다.]
이 용어와 그 관계는 아래에 나와 있습니다.
위에서 언급 한 예제에서, 스타일 시트 프로세서는 처리중인 XML 문서의 수신자이다 (송신자는 언급되지 않았다). 웹 서비스 컨텍스트에서 메시지가 앞뒤로 전달 될 때 보낸 사람과받는 사람의 역할이 서로 다릅니다.
대부분의 웹 서비스 스펙은 입력 및 출력의 정의를 제공합니다. 호환성에 대한 정의에 따라 출력 스키마를 업데이트하는 웹 서비스는 새로운 발신자로 간주됩니다. 입력 스키마를 업데이트하는 서비스는 새로운 수신자입니다.
버전 관리는 결국 거의 모든 XML 응용 프로그램에 영향을 미치는 문제입니다. PDF 파일을 생성하기 위해 문서를 스타일링하는 프로세서 나 금융 거래에 종사하는 웹 서비스 이건간에 응용 프로그램은 기대하지 않는 언어 버전을받을 수 있습니다.
언어가 발전함에 따라 호환성을 앞뒤로 전달할 수 있습니다. [FOLDOC] 정의에서 앞뒤 호환성에 대한 정의를 기본으로합니다. 호환성에 대해서는 소프트웨어 호환성과 스키마 호환성의 두 가지 측면이 있습니다. 그들이 직접 관련되어있는 경우가 종종 있지만, 그렇지 않은 경우도 있습니다.
새로운 프로세서가 오래된 언어의 모든 인스턴스를 처리 할 수 ​​있다면 언어 변경은 하위 호환이 가능합니다. ] 소프트웨어 예제는 버전 5의 워드 프로세서로 버전 4 문서를 읽고 처리 할 수 ​​있습니다. 스키마 예는 버전 5의 스키마로 버전 4 문서의 유효성을 검사 할 수 있습니다. 웹 서비스의 경우 이는 새 버전 용으로 설계된 새로운 웹 서비스 수신자가 이전 언어의 모든 인스턴스를 처리 할 수 ​​있음을 의미합니다. 즉, 보낸 사람이 새 버전을 인식하고 메시지가 성공적으로 처리되도록받는 사람에게 이전 버전의 메시지를 보낼 수 있음을 의미합니다.
구형 프로세서가 최신 언어의 모든 인스턴스를 처리 할 수있는 경우 언어 변경이 포워드 호환 가능] 버전 4의 워드 프로세싱 소프트웨어는 버전 5 문서를 읽고 처리 할 수 ​​있습니다. 스키마 예는 버전 4의 스키마로 버전 5 문서의 유효성을 검사 할 수 있습니다. 웹 서비스의 경우 이는 이전 버전의 언어 용으로 설계된 기존 웹 서비스 수신자가 새 언어의 모든 인스턴스를 처리 할 수 ​​있음을 의미합니다. 즉, 보낸 사람이 새 버전의 메시지를 기존 수신기에 보내고 메시지를 계속 처리 할 수 ​​있습니다.
넓은 의미에서 하위 호환성은 최신 보낸 사람이 기존 서비스를 계속 사용할 수 있음을 의미하며 전달 호환성은 기존 보낸 사람이 최신 서비스를 사용할 수 있음을 의미합니다.
역방향 또는 순방향 호환이 불가능한 변경 비용은 종종 매우 높습니다. 언어를 사용하는 모든 소프트웨어를 최신 버전으로 업데이트해야합니다. 해당 비용의 크기는 문제의 시스템이 열려 있거나 닫혀 있는지 여부와 직접 관련됩니다.
닫힌 시스템은 모든 송신자와 수신자가 단단히 연결되어 단일 조직의 통제하에있는 시스템입니다. 닫힌 시스템은 종종 전체 시스템에 대해 무결성 제약 조건을 제공 할 수 있습니다. 기존의 데이터베이스는 닫힌 시스템의 좋은 예입니다. 모든 데이터베이스 스키마는 즉시 알려지며, 모든 테이블은 적절한 스키마를 따르는 것으로 알려져 있으며 각 행의 모든 ​​요소는 유효한 것으로 알려져 있습니다. 테이블이 일치하는 스키마.
버전 관리의 관점에서 폐쇄적 인 시스템에서는 특정 언어의 새 버전이 그러한 시점에 시스템에 도입되고 이전 버전의 스키마를 준수하는 모든 데이터가 새 스키마로 마이그레이션하십시오.
개방 시스템은 일부 발신자와 수신자가 느슨하게 연결되어 있거나 동일한 조직에 의해 제어되지 않는 시스템입니다. 인터넷은 개방형 시스템의 좋은 예입니다.]
개방형 시스템에서는 모든 소프트웨어 구성 요소에 대한 보편적 인, 동시적인, 원자 적 업그레이드를 통해 언어 발전을 처리하는 것이 실용적이지 않습니다. 변경된 언어를 게시하는 조직의 직접적인 통제 범위를 벗어나는 기존의 발신자 및 수신자는 일부 (가능할 경우 긴) 기간 동안 이전 버전을 계속 사용합니다.
마지막으로 시스템은 시간이 지남에 따라 발전하고 라이프 사이클의 각기 다른 단계에서 요구 사항이 달라짐을 기억하십시오. 개발하는 동안 언어의 첫 번째 버전이 활발히 개발되고있을 때 훨씬 공격적이고 엄격한 버전 관리 전략을 추구하는 것이 중요 할 수 있습니다. 시스템이 생산되고 언어의 안정성이 기대되면 더 많은주의를 기울일 필요가 있습니다. 이전 버전과 호환 가능한 방식으로 진행할 준비가되어있는 것은 프로젝트 초기에 버전 관리에 대한 걱정에 대한 가장 강력한 주장입니다.
1.2 확장 성 및 버전 관리에 대해 걱정하는 이유는 무엇입니까?
문서 또는 메시지가 응용 프로그램간에 교환되면 처리됩니다. 대부분의 응용 프로그램은 유효 및 유효하지 않은 입력을 구별하도록 설계되었습니다. 어떤 종류의 상호 운용성을 가지기 위해서는 "유효하지 않은"및 "유효한"이라는 용어가 의미를 가지도록 언어가 규범적인 방법으로 정의되거나 설명되어야합니다.
이 목적을 위해 사용될 수있는 다양한 도구가 있습니다 (DTD, W3C XML 스키마, RELAX NG, Schematron 등). 이러한 도구는 규범적인 산문 문서 또는 일부 응용 프로그램 특정 유효성 검사 논리로 보강 될 수 있습니다. 대부분의 경우 스키마 언어 만 사용할 수있는 유일한 유효성 검사 논리입니다.
어떤 종류의 기능을 추가하지 않아도 단일 버전의 언어가 배포되는 것은 거의 알려지지 않았습니다. 항상 원래 언어 디자이너는 특정 용어와 제약 조건을 포함하지 않았습니다. 사실 훌륭한 디자이너는 모든 가능한 용어와 제약 조건을 정의해서는 안됩니다. 때때로 이것을 "바다를 끓는 다"라고 부릅니다. 언어가 모든 사람에게 모든 것이 아닐 것이라는 것을 안다면, 언어 설계자는 당사자가 언어 또는 언어 자체의 인스턴스를 확장 할 수 있도록 허용 할 수 있습니다. 일반적으로 도구를 사용하면 언어 디자이너가 인스턴스의 확장 및 언어의 확장이 허용되는 위치를 지정할 수 있습니다. 참고로, 우리는 언어의 인스턴스를 새로운 버전으로 확장한다고 부르지 않습니다. 이 기사에서는 버전 관리에 대한 논의를 언어 변경이 아닌 범위 변경에 대해 다루고 있습니다.
모든 서비스가 새로운 언어의 인스턴스를 유효하지 않은 것으로 간주하는 방식으로 언어를 변경하면 10 개 또는 100 개 또는 100 만 개의 서비스를 배포했는지에 따라 실제 비용이 포함 된 버전 문제가 발생했습니다.
개발 환경 외부에서 언어가 사용되면 언어 변경에 따른 비용이 발생할 수 있습니다. 소프트웨어, 사용자 기대치 및 문서를 변경하여 변경 사항을 수용해야 할 수 있습니다. 단일 제어 영역 외부의 환경에서 언어가 사용되면 변경된 내용에는 여러 버전의 언어가 도입됩니다.
1.3 왜 언어를 확장합니까?
언어의 인스턴스를 확장 할 수 있도록하는 주요 동기는 확장 기능을 설계, 유지 관리 및 구현하는 작업을 분산시키는 것입니다. 보낸 사람은 중앙 권한을 거치지 않고 인스턴스를 변경할 수 있습니다. 이는 언어 소유자가 승인 한없이 송신자 또는 수신자가 변경 될 수 있음을 의미합니다. HTML 작업 그룹이 HTML의 모듈화에 투입하는 노력을 고려하십시오. 확장을위한 분산화 된 프로세스가 없다면 HTML의 모든 변형이 다른 것으로 불려야하거나 HTML 작업 그룹이 다음 HTML 개정판에 포함시키는 것에 동의해야합니다.
확장 성 허용 : 언어 디자이너는 확장 가능한 언어를 만들어야합니다 (SHOULD).
1.4 언어가 바뀌는 이유는 무엇입니까?
다른 버전의 언어가 필요할 수있는 데는 여러 가지 이유가 있습니다. 그들 중 몇 가지는 다음과 같습니다 :
버그를 수정해야 할 수도 있습니다. 생산 방식을 사용하면 수정해야 할 결함이나 감독이 밝혀 질 수 있습니다. 여기에는 언어 구성 요소의 변경 또는 기존 구성 요소의 의미 변경이 포함될 수 있습니다.
요구 사항을 변경하면 스키마 디자인의 변경을 유발할 수 있습니다. 예를 들어 처리가 완료되면 호출자에게 알릴 수 있도록 일부 처리를 수행하는 서비스에 콜백을 추가 할 수 있습니다.
스키마의 다른 풍미가 바람직 할 수 있습니다. 예를 들어, XHTML 1.0 권장 사항은 엄격한, 전환 및 프레임 세트 스키마를 정의합니다. 이 세 가지 스키마는 모두 동일한 네임 스페이스를 정의하는 것을 목표로하지만 매우 다른 언어를 설명합니다.
그리고 추가적인 스키마는 XHTML Basic Recommendation과 같은 다른 명세에 의해 정의 될 수있다.
시간이 지남에 따라 다른 버전의 언어가 존재하고 예측 가능하고 유용한 방식으로이 변경 사항을 처리 할 응용 프로그램을 디자인하면 버전 관리 전략이 필요합니다.
1.5 언어는 어떻게 변합니까?
가장 기본적인 수준에서 언어는 몇 가지 방법으로 만 바뀔 수 있습니다.
요소 : 새로운 요소를 추가하거나, 기존 요소를 제거하거나, 요소의 수용 가능한 횟수를 변경할 수 있습니다. 또한 요소의 내용은 요소 만 내용에서 혼합 내용으로 바뀔 수 있으며 그 반대도 가능합니다.
단순한 내용이있는 요소의 경우 허용되는 값의 유형 또는 범위가 변경 될 수 있습니다.
속성 : 새 속성을 추가 할 수 있으며, 기존 속성을 제거 할 수도 있고, 수용 할 수있는 값의 유형이나 범위를 변경할 수도 있습니다.
의미 : 기존 요소 또는 속성의 의미가 변경 될 수 있습니다.
물론 언어의 두 버전 간 차이점은 임의의 수의 변경 사항 일 수 있습니다.
변경의 가장 중요한 측면 중 하나는 이전 버전과 호환되는지 여부입니다. 예 :
새로운 선택적 요소 또는 속성을 추가하는 것은 역방향 및 전달 호환이 가능합니다.
엘리먼트의 허용 된 횟수 증가는 하위 호환 가능하다. 비슷하게, 단순한 내용이나 속성을 가진 요소의 허용 범위를 확장하는 것은 이전 버전과의 호환성이있다.
요소의 허용 된 수를 줄이는 것은 전달 호환성이 있습니다. 요소 나 속성의 허용 범위를 줄이면 전방 호환이 가능합니다.
1.6 종류의 언어.
궁극적으로 다양한 종류의 언어가 있습니다. 한 가지 종류의 언어에 적합한 버전 관리 방식 및 전략은 다른 버전에 적절하지 않을 수 있습니다. 다양한 종류의 vocabulares 중에서 우리는 다음과 같은 것을 발견합니다.
그냥 이름 : 일부 언어는 실제로 요소 나 속성을 식별하지 않으며, 이름의 목록 일뿐입니다. 예를 들어, QNames를 사용하여 WordNet 데이터베이스의 단어를 식별하거나 XPath2의 함수 및 연산자 이름을 "이름"언어의 예입니다.
독립형 : XHTML, DocBook 또는 The TEI와 같이 자체적으로 더 많이 또는 덜 사용되도록 설계된 언어.
컨테이너 : 다른 언어 또는 페이로드 (예 : SOAP 또는 WSDL)에 대한 래퍼 또는 프레임 워크로 사용하도록 설계된 언어
컨테이너 확장 : 특정 클래스의 컨테이너를 확장하거나 확장하기 위해 설계된 언어입니다. 보안 문자, 비동기 또는 신뢰할 수있는 메시징을 제공하기 위해 SOAP 헤더 블록을 정의하여 SOAP을 확장하는 사양이 컨테이너 확장 언어의 예입니다.
몇 가지 유형의 확장 언어, 요소 확장 및 속성 확장이 있습니다.
요소 확장. 요소 인 언어. SOAP 등은 요소 확장입니다.
속성 또는 유형 확장자. 유형 또는 속성 인 Langages. 이러한 언어는 요소의 컨텍스트에 존재해야합니다. "호스트"요소가 필요한 "기생충"언어라고도합니다. XLink가 그 예입니다.
혼합 : 다른 언어의 일부 의미를 캡슐화하기 위해 설계된 언어 또는 종종 사용됩니다. 예를 들어 MathML은 다른 언어와 섞여있을 수 있습니다.
이것은 결코 완전한 목록이 아닙니다. 이 카테고리들은 완전히 명확하지 않습니다. 예를 들어, MathML은 독립형으로 사용할 수 있으며 SVG와 같은 언어는 독립형, 컨테이너 및 혼합형의 조합입니다.
2 버전 관리 전략.
버전 관리는 광범위하고 복잡한 문제입니다. 서로 다른 공동체는 버전을 구성하는 요소, 합리적인 정책을 구성하는 요소 및 해당 정책과의 차이가있을 때 적절한 행동이 무엇인지에 대해 서로 다른 개념을 가지고 있습니다. 역사적으로, 그것은 이론보다는 실제적으로 항상 더 복잡한 것으로 입증되었습니다.
넓은 의미에서, 버전 관리 방식은 "none"에서 "big bang"에 이르는 다양한 클래스로 분류됩니다.
없음. 언어 버전 간에는 구별이 없습니다. 응용 프로그램은주의를 기울이지 않을 것으로 예상되거나, 발생하는 모든 버전에 대처할 것으로 예상됩니다.
호환 가능. 디자이너는 이전 버전과 호환되는 버전 또는 이전 버전과 호환되는 버전으로 변경 사항을 제한해야합니다.
이전 버전과의 호환성. 응용 프로그램은 "이전"버전의 언어에 대한 인스턴스 문서를 수신하면 제대로 작동 할 것으로 예상됩니다. 이전 버전과의 호환성 변경을 통해 응용 프로그램에서 "이전"버전의 언어를 수신하면 응용 프로그램이 올바르게 작동 할 수 있습니다.
전달 호환성. 응용 프로그램은 "최신"버전의 언어에 대한 인스턴스 문서를 수신하면 제대로 작동 할 것으로 예상됩니다. Forwards compatible changes allow existing applications to behave properly if they receive a "newer" version of the language.
Flavors. Applications are expected to behave properly if they receive one of a set of flavors of the document type.
Big bang. Applications are expected to abort if they see an unexpected version.
There's no single approach that's always correct. Different application domains will choose different approaches. But by the same token, the approaches that are available depend on other choices, especially with respect to namespaces. This dependency makes it imperative to plan for versioning from the start. If you don't plan for versioning from the start, when you do decide to adopt a plan for versioning, you may be constrained in the available approaches by decisions that you've already made.
A language goes through a common lifecycle of iterative development followed by deployment. These place in the lifecycle will affect the selection of the versioning strategy.
Just as there are a number of approaches, there are a number of strategies for implementing an approach. The internet - including MIME, markup languages, and XML languages have succesfully used various strategies, either singly or in combination. Summaries of strategies and requirements have been produced for earlier technologies and guided XML Namespaces and Schema, such as [HTML Document types] and [Web Architecture: Extensible Languages].
For any given approach, some strategies may be more appropriate than others. Among the strategies we find:
Must Understand. Receivers must understand all of the elements and attributes received and are expected to abort processing if they do not. SOAP processors must understand headers that are explicitly identified to be mandatory.
Must Ignore. Receivers must ignore elements or attributes that they do not understand. Sometimes the must understand and must ignore approaches can be combined for more selective use. SOAP processors must ignore headers they do not recognize unless the header explicitly identifies itself as one that must be understood.
There are 2 variations of the Must Ignore strategy:
Must Ignore All This variation on must ignore requires the receiver to ignore an element or attribute it does not understand and, in the case of elements, all of the descendents of that element. Most data applications, such as Web services that use SOAP header blocks or WSDL extensions, adopt this approach to dealing with unexpected markup. For XML, the Must Ignore all rule was first standardized in the WebDAV specification RFC 2518 [WebDAV] section 14 and later separately published as the [FlexXMLP].
Must Ignore Container. This variation on must ignore requires the receiver to ignore an element or attribute that it does not understand, but in the case of elements, to process the children of that element. The Must Ignore Container practice was described in [HTML 2.0]
Explicit Fallback. A language can provide mechanisms for explicit fallback if the extension is not supported. [MIME] provides multipart/alternative for equivalent, and hence fallback, representations of content. [HTML 4.0] uses this approach in the NOFRAMES element. In XML, the XML Inclusions specification [XInclude] provides a fallback element to handle the case where the putatively included resource cannot be retreived.
Explicit Testing. A language can provide a mechanism for explicit testing. The XSLT Specification provides a conditional logic element and a function to test for the existence of extension functions. This allows designers of stylesheets to deal with different receiver capabilities in an explicit fashion.
Languages can choose a mixture of approaches. For example, XSLT provides both an explicit fallback mechanism for some conditions and explicit testing for others. The SOAP specification, another example, specifies Must Ignore as the default strategy and the ability to dynamically mark components as being in the Must Understand strategy.
3 Why Have a Strategy?
Different kinds of languages and different versioning strategies expose different problems. If you don't have a strategy at all, you are effectively choosing the "no versioning" strategy.
It's probably obvious that attempting to deploy a system that provides no versioning mechanism is frought with peril. Putting the burden of version "discovery" on receivers is probably impractical in anything except a closed system.
At the other end of the spectrum is the "big bang" approach which is also problematic.
"Big bang" is a very coarse-grained approach to versioning. It establishes a single version identifier, either a version number or namespace name, for an entire document.
The semantics of the "big bang" are that applications decide on the basis of the document version whether or not they know how to process that document. If the version isn't recognized, the entire document is rejected.. Typically, when introducing a new version using the big bang approach, all of the software that produces or consumes the instance documents is updated in a sweeping overhaul in which the entire system is brought down, the new software deployed and the system is restarted. This big bang approach to versioning is practical only in circumstances where there is a single controlling authority, and even in that case, it carries with it all manner of problems. The process can take a considerable amount of time, leaving the system out of commission for hours if not days. This can result in significant losses if the system is a key component of a revenue generating business process and the cost of coordinating the system overhaul can also be quite costly as well.
Another approach to mitigate against a complete overhaul is to run parallel versions of the system, often with proxies or gateways between them. However, this too has its costs as multiple versions of the software must be supported and maintained over time and there is the added cost of developing the proxy or gateway between the two environments.
The "big bang" approach is appropriate when the new version is radically different from its predecessor. But in many cases, the changes are incremental and often a receiver could, in practice, cope with the new version. For example, it might be that there are many messages that don't use any features of the new version or perhaps it is appropriate to simply ignore elements that are not recognized.
For example, consider two services exchanging messages. Imagine that some future version of the language that they are using defines a new "priority" element. Because senders and receivers are distributed, it may happen that an old receiver, one unprepared for a priority element, encounters a message sent by a newer sender.
If big bang versioning is used, old systems will reject the new message. However, if the versioning strategy instead allowed the old receiver to simply ignore unrecognized content, it's quite possible that other components of the system could simply adapt to the previous behavior. In effect, the old system would ignore the priority element and its descendents so it would "see" a message that looks just like the old format it is expecting.
For the sender, the result would be that the request is fulfilled, though perhaps in a more or less timely fashion than expected. In many cases this may be better behavior than receiving an error. In particular, senders using the new format can be written to cope with the possibility that they will be speaking to old receivers.
If the new system needs to make sure that priority is respected, then it can change the purchase order's name or namespace to indicate that the new behavior is not considered backwards compatible.
What is needed is some sort of middle ground solution. An evolving system should be designed with backwards and forwards compatibility in mind.
One approach is to use version numbers, with a goal of using "major version" changes for incompatible developments and "minor version" changes for compatible ones.
Unfortunately, version numbers often wind up looking very similar to the big bang approach. Each language is given a version identifier, almost always a number, that's incremented each time the language changes. Although it's possible to design a system with version numbers that enables both backward and forward compatibility, for example XSLT, typically, a version change is treated as if that the new language is not backwards compatible with the old language.
Some efforts, such as HTTP, try to have the best of both worlds by allowing for extensibility (in HTTP's case, via headers) as well as version numbers that explicitly identify when a new version is backwards compatible with an old version.
One argument in favor of version numbers is that they allow one to determine what is a 'new version' and what is an 'old version'. But in practice this is not necessarily true. For example, RSS has 0.9x, 1.x, and 2.x versions, all being actively developed in parallel. In effect the version numbers, even though they appear to be ordered, are simply opaque identifiers. Using version numbers does not gaurantee that version 1+x has any particular relationship to version 1.
The self-describing and extensible nature of XML markup, and the addition of XML Namespaces, provide a much better framework for developing languages that can evolve.
4 Designing for the Middle Ground.
Strategies on the extreme ends (none or big bang) might be appropriate for closed systems or systems that are largely autonomous or monolithic. In the modern world of distributed agents cooperating through Web service interactions, the extremes pose obvious problems when messages from a new version of a language may be sent to systems that were written for an older version (and vice versa).
Rather than having a single global version identifier, each XML element in a language should be allowed to version independently of others. We propose that this be accomplished by adopting "must ignore" and "must understand" rules.
If a language parser encounters an element or attribute that is not recognized in a "must ignore" context, then that component and all of its descendants are ignored. This approach allows individual elements to be versioned without having to resort to a big bang approach with its associated problems.
We recognize that it is not always appropriate to ignore content. If a language parser encounters an element or attribute that is not recognized in a "must understand" context, it must consider that an error and react accordingly.
This finding's focus on versioning is driven by a desire to enable loose coupling between senders and receivers of messages written using XML based languages. Loose coupling is best achieved by increasing the possibility for backwards-compatible and forwards-compatible processing to occur when the XML language evolves. A number of rules are described for versioning XML languages, making use of XML Namespaces and W3C XML Schema constructs. The finding also includes rules for working with languages such as SOAP that provide an extensible container model.
4.1 An Example.
Throughout this finding, we'll motivate our discussion of versioning with an ongoing example. Suppose that you have designed a Web service for handling purchase orders. Processing begins when a purchase order is sent to the service. The order is processed and notification of shipment is sent back.
Because it can take an arbitrary amount of time to process an order before the shipped message can be sent (consider, for example, items on back order), it's useful to make this process asynchronous. To do this, the original sender includes a "call back" address where the shipped message should be delivered when it is ready.
Initially, the purchase order processor acts as a receiver. But when the shipment has been processed, it acts as a sender, delivering a message to the callback service that was provided.
Ultimately, we'll describe the evolution of this callback service in two ways: one that is backwards compatible and one that is not.
In order to be schema agnostic, we will use the examplotron [examplotron] schema to convey the semantics of our examples. Examplotron uses examples to indicate the type and uses additional attributes to refine the content model.
Ironically, given the purpose of this finding, examplotron doesn't currently support extensibility points. We are in discussion with Eric van der Vlist about adding this functionality. For the time being, we've added the [. ] notation to indicate arbitrary extensibility of attributes and elements.
5 Identifying and Controlling Languages.
Our main goal is to allow backwards and forwards compatible processing as a language evolves. In order to accomplish those objectives, we must achieve two goals:
The processor must understand the semantics of every valid message that it receives. We must therefore define the semantics of messages that contain new elements or attributes.
We assume that each service rejects invalid messages. Therefore, it must be possible for our language to evolve without changing the schema that we've defined for it. New versions of a service might be deployed with newer schemas, but we want these new services to be able to communicate with the already deployed senders and receivers that will continue to use the old schemas. That is why forwards compatible language changes have to be possible without changing the schema.
In order for a schema to be extensible in the way described above, to allow new elements or attributes to be added without changing the schema, the schema must allow extension in any namespace. This brings us to the next rule for enabling a must ignore versioning strategy in XML languages:
Any Namespace: The language SHOULD provide for extension in any namespace.
It usually makes sense to allow extension in attributes as well.
Full Extensibility: All XML Elements that can allow attributes, ie ComplexTypes in XML Schema, SHOULD allow any attributes and any elements in their content models.
The corollary of extensibility in any namespace, including the language's namespace, is that a namespace does not identify a single version of a language or set of names. A namespace identifies a compatible set of names.
Namespace identifies compatible names: The namespace name identifies names that are compatible within the same namespace name.
The section on extensibility will describe the conditions for when to change the namespace name as a result of change in the names identified.
Given that a namespace name is not for a single version of a language or set of names, it may be useful to identify the particular version. An example would be specifying in a policy statement the exact language supported by a software agent. This use of version identification could be considered each compatible "minor" version, with the namespace name identifying the incompatible versions.
Identify specific version with version attribute: The specific version of a set of names within a given namespace may be identified with a version attribute to differentiate between the compatible versions.
This finding does discourage the use of version attributes. Namespace names provide a solution for identification in the large majority of the identification problems. Version attributes should only be used in a small minority of the cases. There is considerable danger that using version attributes to differentiate between compatible versions in a namespace will morph into differentiating between incompatible versions of a namespace and thus replacing much of the namespace functionality.
5.1 Defining an Extensible Callback.
A number of examples illustrate the application of these and subsequent rules.
A major motivation for using XML Namespaces is to allow extensions to a language to be made by any party. The example above accomplishs that goal. An extension can be defined by any new specification simply by normative reference to the original Callback specification and a definition for the new element or attribute content. No permission should be needed from the authors of the specification to make such an extension.
6 Understanding Extensions.
The key value of the extension strategy described above is that existing XML documents can be extended without having to change existing implementations. For languages that are intended to be extensible, specifications SHOULD provide a clear processing model for extensions.
Provide Processing Model: Languages MUST provide a processing model for dealing with extensions.
Given that an existing processor cannot possibly know the intended semantics of a component that its never seen before, only one semantic is possible: ignore that component. We propose, therefore, that processors "must ignore" elements and attributes they do not recognize.
For many applications, including most Web services, the most practical rule is: must ignore .
Must Ignore: Receivers MUST ignore any XML attributes or elements that they do not recognize in a valid XML document.
This rule does not require that the elements be physically removed; only ignored for most processing purposes. It would be reasonable, for example, if a logging application included unrecognized elements in its log. There are cases where the elements should not be physically removed. An example is an application that forwards the content to another receiver should preserve the unknown content.
There are two broad types of languages relating to dealing with extensions. These two types are presentation or document and data oriented applications. For data oriented applications, such as Web services, the rule is:
Must Ignore All: The Must Ignore rule applies to unrecognized elements and their descendents.
Applications must deal carefully with the ignored elements, especially if any of them are counted or if the application makes use of information about their position.
Imagine that software compliant with the Example 1 Callback type received the following Callback document:
The receiver ignores the ncs:foo attribute and the ncs:conf element (and its descendents, so it never even sees ncs:lk3 ). With these components ignored, the request can be processed just like an older version of the callback.
Document oriented languages need a different rule as the application will still want to present the content of an unknown element. The rule for document oriented applications is:
Must Ignore Container: The Must Ignore rule applies only to unrecognized elements.
This retains the descendents of the ignored container element, such as for display purposes.
In order to accomodate big bang changes when they are needed, the must ignore rule is not expected to apply to the root element. If the document root is unrecognized, the entire message must be rejected.
This finding outlines some general extensibility strategies. However, individual languages will choose rules in accordance with their requirements.
7 Versioning Languages.
The strategy outlined above distributes the notion of versioning down into the messages. Changes that are compatible with the extension mechanism do not require a namespace change.
Re-use Namespace Names and Element Names: If a backwards or forwards compatible change is made to an element definition by the owner of the element's namespace, then the old namespace name and element names SHOULD be used in conjunction with the extensibility model.
Imagine the callback element is extended to specify an expiration, to indicate that the callback will only be viable until a specified time.. In the example below, this is done with an expires element added to the extensibility point in the original schema:
The expires element is defined in the callback namespace, but it isn't defined in the earlier callback specification. If the callback receiver is only aware of the earlier version of the Callback specification and follows the must ignore rule, it will ignore the expires element because it is not defined in the specification that the receiver understands.
Here is an example of a schema that could be used for the new language that would preserve backwards compatible behavior.
This CallbackType with an optional expires element is backwards compatible with earlier specification.
As you can see, the namespace name is not always required to change when a specification evolves. It depends on the compatibility of the actual changes.
7.1 Backwards Incompatible Changes.
There are a few distinct types of backwards incompatible change:
A required information item is added.
The semantics of an existing information item are changed.
The maximum number of allowable items is reduced. This change does not guarantee incompatibility. Instance documents where the maximum number allowable is still greater than or equal to the number of occurrences will still be vaild. If the maximum number of allowable items is reduced below the previous versions minimum number, then incompatibility is guaranteed.
Imagine that the semantics of callback location change; perhaps a conversation identifier is now required as part of the location. In this case, the must ignore rule won't help as ignoring the conversation identifier will result in invalid callbacks being dispatched.
Backwards incompatibility can easily be achieved by changing the namespace name or any of the element names. For example:
Notice that the namespace name has been changed from /callback to /conversationCallback . This new message type will be rejected as unrecognized by older systems. Deploying this callback request is an example of a "big bang" change.
7.2 Namespace content changes.
Only the owner of a namespace can change (ie. version) the meaning of elements and attributes in that namespace.
Only Namespace Owners Change Namespace: The namespace name owner is the only entity that is allowed to change the meaning of names in a namespace.
There is a school of thought that says that every extension should be placed in a separate namespace; that after publication, no new names should be added to a namespace. If you hold that point of view then you may not feel that an extensibility element is necessary or desirable.
Another school of thought says that the maintainers of the language have a right to add new names to a namespace as they see fit. There are certain advantages associated with adding new names in the same namespace.
It reduces the number of namespaces needed to describe instances of the document. There are significant convenience advantages to using defaulted namespaces for document creation and manipulation.
It provides a clear separation between extensions by the language designers and extensions by third parties.
There may be additional benefits in code generation and reuse if single namespace or a small set of namespaces can completely describe the language.
A namespace name owner will use the lifecycle of the namespace as one of the factors in determining whether to revise the namespace or not. Typically, the changes during development are not compatible changes. The author of namespaces that are under development will typically follow a "big bang" approach. This helps reduce the number of potentially buggy or immature implementations that are deployed. A W3C specification is a good illustrative example. It will probably change namespace names for each working draft. The Candidate Recommendation, Proposed Recommendation and Recommendation namespaces names should only be changed if compatibility is not achieved.
8 Containers.
The issues of versioning and extensibility seem particularly relevant to languages such as SOAP that are designed to be containers for transporting a message payload.
Containers are XML elements designed specifically for holding XML elements whose names and definitions are not known when the container language is designed. For example, SOAP messages are containers. They have an extensible Header element and an extensible Body element. Another examples of a container is WSDL definitions [WSDL 1.1]. Containers are generally trivial and uninteresting without one or more extensions.
A container extension is typically the specification of a discrete feature. Extensions provide semantic meaning for either the container or the contained message or both. SOAP header blocks, like the callback example, SOAP bodies, policy statements, etc., are examples of extension languages. In Web services, there are a few well-established containers, so most Web service languages are container extensions.
Some standalone languages are also specification of a discrete feature, but do not need a container. Some examples are MathML, SVG, XHTML. A standalone language may also be a container. Further, unknown to a standalone language author, it may be used as a container extension. Thus even standalone languages should plan for extensions and on being used as an extension. An example of a standalone language that has been subsequently used as an extension is the XML Digital Signature language. It does not allow for attribute extensibility, therefore any use of XML Signature as a SOAP extension requires the creation of a wrapper element to accomodate the soap:mustUnderstand and soap:role attributes.
Given adoption of the Must Ignore practice, it is often the case that the creator of an extension wants to require that the receiver understand the extension, over-riding the must ignore Good Practice. The most common technique is to use a "must understand" option. This is typically a flag that indicates that the element must be understood.
Provide Must Understand: Container languages MUST provide a "must understand" model for dealing with optionality of extension elements that override the must ignore good practice.
The must understand and must ignore good practices work together to provide a stable processing model for containers. The [SOAP 1.2], and [WSDL 1.1] attributes and values for specifying "must understand" are respectively: soap:mustUnderstand="1" , wsdl:required="1" . SOAP is probably the most common case of a container that provides a must understand processing model. The default value for soap:mustUnderstand is 0, which is effectively the must ignore good practice.
Use Must Understand: When a language provides a must understand model, extensions MUST use it when the extension is required.
In the callback examples, the callback sender can make use of the fact that the SOAP specification provides a must understand model. A sender can mark the callback with soap:mustUnderstand="true" to indicate that a receiver must understand the callback. This allows the sender to insert extensions into the container and use the must understand attribute to override the must ignore rule. Senders can extend SOAP or their own extensions without changing namespaces, retaining compatibility with the receiver. Obviously the receiver may be updated to handle new extensions, but there is now a loose coupling between the language's processing model and the extension processing model.
When dealing with non-container XML languages, there is less obvious need for providing a must understand mechanism. If a change is made in the content of the XML language that is not backwards compatible, then the root element of the XML language can be changed either by changing the element name or the namespace. In the case of standalone languages, this will probably result in receivers of the instance document generating a fault. In extension languages, the results will be governed by whether the extension is marked with must Understand or not. In the standalone case, the extensibility model described earlier means that language could become an extension.
As XML does not allow attributes on attributes, it is very difficult to mark an extension attribute with a Must Understand attribute.
Must Understand for elements: Container languages should provide a Must Understand model for elements and not attributes.
As a result of this, an extension language designer that wants a component to be understood should model the component as an element and not an attribute.
Promote Must Understand: If a must understand option is not provided inside a particular extension element, the extension that must be understood SHOULD be promoted to the first container that does provide a must understand option.
Imagine that an author is adding a conversation ID extension and the conversationID element must be understood. The conversation ID author cannot change the callback namespace and the callback doesn't provide a must understand model. The solution is that the conversation ID author promotes the conversationID element so that it is a sibling of the callback, and specifies must understand in the container. This is shown below:
This rule should also be applied when the receiver may have performed significant work prior to finding the extension that must be understood and it may have to undo the performed work if it doesn't understand the extension. Imagine if the callerID was deep inside the callback. The receiver might perform significant work in processing the callback. Upon finding a must understand extension that it doesn't understand, the receiver may have to undo all the work that it had performed. Promoting must understand elements to the highest level minimizes the work that may have to be undone.
A container may allow multiple required extensions to be targetted by an individual receiver. It is then possible that one or more of the extensions may not be understood by the receiver. The container should anticipate this possibility and require that either all the mandatory extensions are processed by the recevier, or none. An example of this is the SOAP 1.2 specification.
Multiple Extensions: Containers should specify the expected behaviour of multiple extensions.
9 XML Schema.
Many deployed and developing XML languages today use W3C XML Schema to define their message types. Although other efforts, such as OASIS RELAX NG schemas, deserve serious consideration, this section focuses on W3C XML Schema and the challenges it introduces to the design of a versionable language. Similar challenges are faced by designers using other schema languages, and similar strategies can be employed to conquer them. The following sections assume that the reader is familiar with XML Namespaces, W3C XML Schema, and particularly W3C XML Schema's wildcard element <xs:any> . Readers are encouraged to read some of the well-written existing XML Schema extensibility and versioning best practices, such as [XFront Schema Best Practices] and [XML Schema Design Patterns].
9.1 Identifying and Controlling Languages.
Extensibility in XML Schema is commonly accomplished using the wildcard element, and the ##any namespace or a combination of the ##other namespace and the ##targetnamespace in some designated "extensibility" element.
Using the callback example, our first attempt at designing an extensible schema might look something like this:
At first, this seems to accomplish what we want: it allows a callbackLocation followed by any number of extension elements. But a problem arises when we want to extend this schema. Suppose, for example, that we want to add an optional expires element. Although we can send a expires to receivers using the old schema, we want a new schema that allows new receivers to validate the expires element explicitly. It appears that the following would work:
But that schema fragment is invalid . There are constraints about namespace options and optional elements that are explained later in 9.3 Determinism . In order to create valid extension schemas that allow extension in the same namespace, we must create a container to hold them.
This brings us to the first rule for enabling a must ignore versioning strategy in XML languages defined by W3C XML Schema:
Any Namespace - XML Schema edition: The language MUST provide an extensibility element that allows for extension in the current namespace and a wildcard that allows for extension in another namespace.
For extensions in the same namespace, create an extension type. For example:
This example is similar to the <xs:appinfo> element in W3C XML Schema itself, which can contain extensions.
For extensions in another namespace, use <xs:any namespace="##other"> at appropriate points in your content model. W3C XML Schema determinism requirements impose restrictions on where these extensions may appear. Generally, it is easiest of the extensions are allowed at the end of the content model.
Full Extensibility - XML Schema edition: All XML Elements MUST allow any attributes and allow any elements at the end of their content models. XML Elements MAY allow for element extensibility elsewhere in their content models.
Consider the following definition for a SOAP header lock for our callback example.
This design has the serendipitous advantage that it separates third party extensions (in a different namespace) from compatible changes in the same namespace. If both extensions in the same namespace and in other namespaces are provided in the same document, there is no confusion about ordering of the extensions. This strict ordering of extensions guarantees that validation can occur. In particular, a node that has the newer language can use the newer schema, and all the binding and mapping tools that are available.
In both cases, lax validation is used to enable validation to succeed with or without a schema for the extension elements.
Use of an Extension element for extensions in the same namespace and namespace="##other for extensions in different namespaces avoids determinism issues discussed later.
9.2 Versioning.
The callback element with a expires using XML Schema is.
Here is the callback with expires example in an XML Schema.
9.3 Determinism.
XML DTDs and W3C XML Schema have a rule that requires schemas to have deterministic content models. The meaning of deteriminism can be seen in this example from the XML 1.0 specification:
"For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the XML processor cannot know which b in the model is being matched without looking ahead to see which element follows the b."
The addition of wildcards in W3C XML Schema (not present in XML DTDs) leads to some schemas that we might like to express, but that aren't allowed.
Wildcards with ##any , where minOccurs does not equal maxOccurs , are not allowed before an element declaration.
The element before a wildcard with ##any must have a maxOccurs value that equals its minOccurs value. If these are different, then the optional occurrences could match either the element definition or the wildcard. As a result of this rule, the minOccurs must also be greater than zero.
Types in any namespace derived by extension that add element definitions after a wildcard with ##any must be avoided.
Types in a different namespace derived by extension that add element definitions after a wildcard with ##other must be avoided. This rule and the previous are roughly equivalent for explanation purposes: In these two cases, violating these constraints with an instance of the added element definition could match either the wildcard or the derived element definition. This is effectively the first bulleted rule applied to derivation. The problems of determinism make derivation and wildcarding almost mutually exclusive, the exception being extension in the same namespace and wildcards with ##other .
Be Deterministic: Use of wildcards must be deterministic. Location of wildcards, namespace of wildcard extensions, minOccurs and maxOccurs values are constrained, and type restriction is controlled.
We'll look at these alternatives in more detail below.
It is worth noting that the structural limitations introduced by W3C XML Schema's handling of determinism are a consequence of W3C XML Schema's design and are not an inherent limitation of schema-based structures. In RELAX NG, for example, none of these determinism constraints apply.
9.4 Containers and XML Schema.
It should be noted that W3C XML Schema does not permit validation of containers with required or optional extensions. Effectively, W3C XML Schema cannot specify that a particular extension is required or optional in a wildcard element. It does have the process contents attribute that controls validation of content, but there is no option that says a particular extension is required. Languages such as WSDL SOAP binding extension have been created to allow the specification of a container and required extensions in containers like SOAP. The WSDL SOAP binding extension provides a new schema that merges the SOAP schema with the SOAP extension schemas.
9.5 Alternative Extensibility and Versioning Techniques.
There are a variety of alternative extensibility and versioning techniques.
9.5.1 Using ##any.
As shown earlier, a common design pattern is to provide an extensibility point, not an element, allowing any namespace at the end of a type. This is typically done with <xs:any namespace="##any"> .
Determinism makes this unworkable as a complete solution. First, the extensibility point can only occur after required elements in the original schema, limiting the scope of extensibility in the original case. Second, backwards compatible changes require that the added element is optional, which means minOccurs="0" . Determinism prevents us from placing a minOccurs="0" before an extensibility point of ##any . Thus, when adding an element at an extensibility point, the author can make the element optional and lose the extensibility point, or the author can make the element required and lose backwards compatibility.
Forwards compatibility with required elements can be retained because the sender can send the new required element and the receiver should simply ignore it. The solution provided in this finding provides a superior solution for extensibility near optional elements and backwards compatibility.
9.5.2 Using ##other.
Another common design pattern is to use an extensibility point allowing other namespaces, <xs:any namespace="##other"> , for extensibility. While this option is obviously useful for adding distributed extension and avoids determinism problems, it does not allow backwards or forward compatible changes. In order to add an extension element to an XML fragment, it must be in a new and separate namespace. The option of adding an extension that is in the same namespace is not allowed using ##other . Critically, this means that a change without changing a namespace or adding a namespace cannot be done. The owner of the namespace will typically change the namespace name or element names when making changes. That means that all receivers that are not updated will reject the messages. This effectively forces the big bang approach.
The namespace owner could use a new namespace for the compatible changes. This technique suffers from a number of problems. An extension in a different namespace means that the newer schema cannot be validated. Specifically, there is no way to take a schema with a wildcard, such as SOAP, and then create a new Schema that constrains the wildcard. For example, it is not possible to take the SOAP schema and require that foo and bar elements are required to be children of the header element. Indeed, the need for this functionality spawned the message and part constructs in WSDL. The ordering of the extensions cannot be controlled. It results in specifications and namespaces that are inappropriately factored. In the callback with expires example, a high cohesion design means that expires belongs in the callback namespace and not a new one. Further, the re-use of the same namespace has better tooling support. Many applications use the schema to create the equivalent programming constructs. These tools often work best with single namespace support for the "generated" constructs.
To a certain degree, we've combined these two designs together to produce a design that achieves extensibility and is compliant with W3C XML Schema's determinism constraint. The namespace name owner can add backwards and forwards compatible changes into the extensibility element, and other authors can add their changes at the wildcard location.
9.5.3 Using Type Extension.
Extending languages in W3C XML Schema can also be accomplished by the use of derivation. For extension, the new type must have all the existing elements and attributes and may add new elements or attributes. Consider a callback example using <xs:any namespace="##other"> that has the conversation identifer element as an option in the extension. Using extension, this might appear as follows:
And instance of the extended type would appear as:
Looking at the instance, how does a receiver "know" that the CallbackExtended element is of type CallbackExtendedType , which is an extension of CallbackType? It must examine the schema for the CallbackExtended element.
In a closed environment, the new types may be retrievable and type substitution is acceptable. An example of this could be XML Query.
In an open environment, the receiver wouldn't have the schema available since the receiver was only programmed to support the original Callback . Dynamically retrieving schemas to figure out what has happened poses many risks, including security and performance. This is the fundamental difference between using <xs:any> and W3C XML Schema extension. <xs:any> allows extensions to be added and does not require that the new schema is available to the older software (if processContents is "skip" or "lax") whereas W3C XML Schema extensibility requires the newer schema. Note also that the W3C XML Schema extensibility model does not permit the same element name and namespace to be be used for the newer version. This is because W3C XML Schema rightly does not allow duplicate element name declarations.
Another important language decision is "When should wildcards be used versus type extension?" This is a very difficult judgment call. One rule of thumb is that type extension is for lower level constructs in a particular namespace and wildcard element are for higher level constructs. Higher level constructs are those that are intended for extension by 3rd parties or those where backwards or forwards compatibility is a goal. Care must be taken when combining wildcards and allowing derivation. For example, the common pattern of using ##other in extension points precludes derivation in a different namespace, such as a newer version of a specification.
9.5.4 Substitution Groups.
Substitution groups are another technique for extensibility. This is similar to type derivation, as a new type is created that can be substituted in for an existing type. The new type is merged in to the schema at the substitution point, and the base type is left as-is. The same trade-offs for Type extension apply for substitution groups.
9.5.5 Redefine.
9.5.6 Comparison Summary.
These comparisons demonstrate that wildcard extensibility is the only W3C XML Schema mechanism that allows backwards-compatible and forwards-compatible changes without updating older software, particularly in open environments. Using ##any or ##other do not provide the ability to re-use namespaces and introduce compatible changes. Type derivation and substitution groups require an exchange of schemas, which may not be appropriate in an open environment.
A References.
B Acknowledgements (Non-Normative)
The author thanks the many reviewers that have contributed to the finding, particularly David Bau, Edd Dumbill, Chris Ferris, Yaron Goland, Eve Maler, Mark Nottingham, and Cliff Schmidt. This finding borrows, with permission of the authors, examples and some text from WS-Callback [6].

[Editorial Draft] Extending and Versioning Languages: XML Languages.
Draft TAG Finding 04 July 2007.
This document discusses the XML related aspects of versioning. It describes XML based terminology, technologies and versioning strategies. It provides XML Schema examples for each of the strategies and discussion about various schema design patterns. A number of XML languages, including XHTML and Atom, are used as case studies in different strategies.
Status of this Document.
This document has been developed for discussion by the W3C Technical Architecture Group. It does not yet represent the consensus opinion of the TAG.
Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.
Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.
Please send comments on this finding to the publicly archived TAG mailing list www-tagw3.org (archive).
목차.
1. 소개.
Extending and Versioning XML Languages Part 1 described extending and versioning languages. Part 2 focuses on XML and includes schema language specific aspects of extending and versioning XML. The choices, decisions, and strategies described in Part 1 are augmented with XML and XML Schema instances herein.
1.1 XML Terminology.
We will describe some key refinements to our versioning terminology for XML. An XML language has a vocabulary that may use terms from one or more XML Namespaces (or none), each of which has a namespace name. [Definition: An XML language is an where all the Texts MUST be well-formed XML.]. As XML is a markup language, the most significant parts of XML Languages are the elements and attributes. [Definition: A component is an XML element or attribute.] The Name Language - consisting of name, given, family terms - has a namespace for the terms. We use the prefix "namens" to refer to that namespace. The Name Language could consist of terms from other vocabularies, such as Dublin Core or UBL. These terms each have their own namespaces, illustrating that a language can be comprised of terms from multiple namespaces. An XML Namespace is a convenient container for collecting terms that are intended to be used together within a language or across languages. It provides a mechanism for creating globally unique names.
We use the term instance when speaking of sequences of characters (aka text) in XML. [Definition: An instance is a Text of well-formed XML and the texts are usually constrained by a schema language.] The schema language may be machine processable such as DTDs, XML Schema, Relax NG, or the schema language may be human readable text. Documents are instances of a language. In XML, they must have a root element. A name text might have a name element as the root element. Alternatively, the name vocabulary may be used by another language such as purchase orders so the purchase order texts may contain texts that can be considered name texts. Thus instances of an XML language are always at least part of a text and may be the entire text. XML instances (and all other instances of markup languages) consist of markup and content. In the name example, the given and family elements including the end markers are the markup. The values between the start and end markers are the content. An instance has an information model. There are a variety of data models within and without the W3C, and the one standardized by the W3C is the XML infoset.
The XML related terms and their relationships are shown below.
ednote: Update to final terminology version..
Some examples of XML consumers and producers are: A stylesheet processor is a consumer of the XML text that it is processing (the producer isn't mentioned); in the Web services context the roles of producer and consumer alternate as messages are passed back and forth. Note that most Web service specifications provide definitions of inputs and outputs. By our definitions, a Web service that updates its output schema is considered a new producer. A Web service that updates its input schema is a new consumer.
1.2 Kinds of XML Languages.
Ultimately, there are different kinds of XML languages. The versioning approaches and strategies that are appropriate for one kind of language may not be appropriate for another. Among the various kinds of vocabularies, we find:
Just Names : some languages don't actually identify elements or attributes; they're just lists of names. Using QNames to identify words in the WordNet database, for example, or the names of functions and operators in XPath2 are examples of "just name" languages.
Standalone : languages designed to be used more-or-less by themselves, for example XHTML, DocBook, or The TEI.
Containers : languages designed to be used as a wrapper or framework for some other language or payload, for example SOAP or WSDL.
Container Extensions : languages designed to extend or augment a particular class of container. Specifications that extend SOAP by defining SOAP header blocks, for example, to provide security, asynchrony or reliable messaging are examples of container extension languages.
There are a few types of XML extension languages, element extension and type or attribute extension.
Element Extension . Languages that are elements. SOAP, etc. are element extensions.
Attribute or Types . Languages that define types or attributes. These languages must exist in the context of an element that is not defined by the language. Sometimes called "parasite" languages as they require a "host" element. XLink is an example.
Mixtures : languages designed for, or often used for, encapsulating some semantics inside another language. For example, MathML might be mixed inside of another language.
This is by no means an exhaustive list. Nor are these categories completely clear cut. MathML can certainly be used standalone, for example, and languages like SVG are a combination of standalone, containers, and mixtures.
2 XML Language Requirements.
The general language questions described in Part 1 Requirements (../versioning#requirements). These requirements are augmented in XML by:
Accuracy of XML Schema for the versions of the language. By accuracy, we mean the degree to which the language is described. We will see how some designs preclude full XML Schema description. Often this results in Schemas that are incomplete at the first and subsequent versions. The options are typically: Complete in all versions, complete in first version only, incomplete in all versions.
Use of generic XML and namespace only tools (precluding vocabulary specific versions). This itself is a trade-off because some generic XML tools (like XPath) are more difficult to use with multiple namespaces containing the same "thing", like XHTML's P element.
3 Version Identification technologies.
Version identification of elements and attributes is often an important part of correctly processing xml documents. There are a large variety of version identification technologies in XML. The fundamental technologies available for identification of versions in an xml document are:
Qualified Names: Namespaces + Local Name.
The decisions about which technologies to use are affected by the general language requirements and by the XML environments.
3.1 Qualified Name: Namespace + Local name.
The Namespaces specification defines a Qualified Name as the Namespace and Local Name of a . From a versioning perspective, we are mainly concerned with element and attribute names, and not content. A primary motivation for Namespaces is the ability for decentralized extensibility and the resulting prevention of name collision.
Many systems use type information associated with the component as part of the version identification of the component. There are generally two strategies for determining the type of a component, which we will call "Top-typing" and "Bottom-typing". Top-typing is a style where the type of the component is determined by the type of the top element. Bottom-typing is a style where the type of the component is determined by the type or types of the descendents of the top element. Bottom-typing designs can be such that a single type is not possible, rather it is effectively the composite of all the descendent types. When top-types are extended, the type becomes more difficult to specify. It could be the top-type, or it could be the top-type plus all of the extension types. A nameType extended with a middleNameType could be considered a nameType or a nameType+middleNameType. As the number of extension types grows, specifying the actual type may become equivalent to some bottom-typing designs. The extreme of this is container languages, which have a single invariable top-type and typically intended to have the type determined by the content at the "bottom" of the container.
In either typing case, the type(s) used for type determination is determined by the type associated with the qualified name of the component during type assignment. In XML Schema, this is during validation. It is also possible to specify the type in the instance. XML schema provides an xsi:type attribute that specifies the type of the component. This overrides the assocation between the qualified name of the component and the type as specified in the schema, or it provides a type where the qualified name of the component might not be resolvable into a type.
XML Schema is designed to assign type information as part of validation. Other languages, notably RelaxNG, do not assign type information and have no notion of types. The decision to use types and re-use types across components is an important factor in component version identification because the component definition and the component's type may be versioned separately.
3.3 Version Numbers.
A significant downside with using version identifiers in XML is that software that supports both versions of the name must perform special processing on top of XML and namespaces. For example, many components “bind” XML types into particular programming language types. Custom software must process the version attribute before using any of the “binding” software. In Web services, toolkits often take SOAP body content, parse it into types and invoke methods on the types. There are rarely “hooks” for the custom code to intercept processing between the “SOAP” processing and the “name” processing. Further, if version attributes are used by any 3rd party extensions—say midns:middle has a version—then a schema cannot refer to the correct middle type.
4 Component version identification strategies.
The strategy for identifying the version of a component is perhaps the most important decision in designing an XML Language. The use of namespace names, component names, version numbers and type information are all critical in achieving the desired versioning characteristics. The strategies range from many namespaces per version of a language to only 1 namespace for all versions of a language. A few of the most common are listed below and described in more detail later.
ie first version consists of namespaces a + b, next compatible or incompatible version consists of namespaces c + d; or first version consists of namespace a, next compatible or incompatible version consists of namespace b.
ie first version consists of namespace a; next compatible version consists of namespaces a + b; next incompatible version consists of namespaces c + d.
ie first version consists of namespace a, next compatible version consists of namespace a + b with additions to namespace a.
ie first version consists of namespace a, next compatible version consists of namespace a, next incompatible version consists of namespace b.
ie first version consists of namespace a + b + version attribute “1”, next compatible or incompatible version consists of namespace c + d + version attribute “2”.
ie first version consists of namespace a + version attribute “1.0”, next compatible version consists of namespace a + version attribute “1.1”, next incompatible version consists of namespace a + version attribute "2.0".
Whatever the design chosen, the language designer must decide the component name, namespace name, and any version identifier for new and all existing components.
Elaborating on these designs is illustrative.
4.1 Versioning Strategy #1: all components in new namespace(s) for each version.
The following names would be valid:
The 2 nd and 3 rd examples shows all the components in the same new namespace, with the 3 rd showing a new name as well.. The 4 th and 5 th example show an additional middle element in 2 different namespace names. The 4 th example comes from a namespace name that is in the same domain as the name element’s new namespace name. One reason for 2 namespaces is to modularize the language. The 5 th example shows a namespace name from a different domain for the middle.
4.1.1 Advantages and Disadvantages.
In this strategy, forwards compatibility is not desired. Any change or extension is an incompatible change with an existing consumer. When an older consumer receives the new texts in the new namespace, most of the software will break, such as performing schema validation without the new schema. Achieving forwards compatibility in parts of a system is possible and it requires careful selection of technologies, such as XPath expressions that are namespace agnostic. The effect of the change being an forwards incompatible change is the design goal of some systems that have adopted this strategy.
4.2 Versioning Strategy #2: all new components in new namespace(s) for each compatible version.
In this strategy, the following names would be valid:
The 2 nd and 3 rd example show an additional middle element in 2 different namespace names. The first middle, the 2 nd example, comes from a namespace name that is in the same domain as the name element’s namespace name. The 3 rd example shows a complete different namespace name for the middle. It is probable that the midns:middle was created by the name author, and the middiffdomain:middle was created by a 3rd party.
4.2.1 Advantages and Disadvantages.
Backwards and forwards compatibility can be supported from the first version. This design precludes the language designer from re-using a namespace name for changes, which may be desirable as introducing new namespace names can be difficult. XML Schema generally does not support more than one compatible revision of the schema in this strategy as shown in 7.2 #2: all new components in new namespace(s) for each compatible version .
4.3 Versioning Strategy #2.5: all new components in new or existing namespace(s) for each compatible version.
We have called this strategy "2.5" because it is a mixture of strategy #2 and strategy #3. In this strategy, the following names would be valid:
The 2 nd example shows the use of the optional middle name in the name namespace. The 3 rd and 4 th example show an additional middle element in 2 different namespace names. The first middle, the 3rd example, comes from a namespace name that is in the same domain as the name element’s namespace name. The 4 th example shows a complete different namespace name for the middle. It is probable that the midns:middle was created by the name author, and the middiffdomain:middle was created by a 3rd party.
4.3.1 Advantages and Disadvantages.
Backwards and forwards compatibility can be supported from the first version. Depending on the schema design, some new components do not require a namespace change. XML Schema generally does not support more than one revision of the schema in a compatible way in the new components in new namespace(s) for each compatible version strategy, as shown in 7.2 #2: all new components in new namespace(s) for each compatible version .
4.4 Versioning Strategy #3: all new components in existing namespace(s) for each compatible version.
In this strategy, the following names would be valid:
The 2 nd example shows the use of the same namespace because the middle is optional. The 3 rd example shows the use of the same namespace because the middle is optional and the middle embedded inside an "extension element". The 4 th example shows the use of a new namespace for all the components, such as a mandatory middle name.
4.4.1 Advantages and Disadvantages.
Backwards and forwards compatibility can be supported from the first version without namespace changes. This means new components do not require new namespaces which generally means less chance of incompatible evolution. The use of either existing or new namespace gives the language designer greater choice in use of namespace names than just using a new namespace. As always, only the language designer has the ability to use and augment the namespace name of the first version. XML Schema does not support the 2nd example in many situations because of the Unique Particle Attribution constraint. XML Schema does support the 3rd example as shown in 7.4 #3: All new components in existing namespace(s) for each compatible version .
4.5 Versioning Strategy #4: all new components in existing or new namespace(s) for each version and a version identifier.
Using a version identifier, the name instances would change to show the version of the name they use, such as:
In the last two example, the version number has been changed from 1.0 to 2.0. Incrementing the major part of a version number often indicates an incompatible change. In this case, perhaps it indicates that the middle name is now mandatory where it had previously been optional.
4.5.1 Advantages and Disadvantages.
Backwards and forwards compatibility can be supported from the first version without namespace changes. However, the use of version numbers means that the relationship or binding between the Qualified Name of a component and a language's interpretation requires the use of the version number. That means that general binding tools, such as XML to Java mappings, often cannot be used stand-alone.
4.6 Versioning Strategy #5: all components in existing namespace(s) for each version and a version identifier.
Using a version identifier, the name instances would change to show the version of the name they use, such as:
The 2 nd example shows that the middle is an optional part of the name. The last example shows that the middle is a mandatory part of the name.
4.6.1 Advantages and Disadvantages.
Backwards and forwards compatibility can be supported from the first version without namespace changes. Software that extracts the given and family name based upon the Qualified name will often not break because a new namespace name is not used. However, the use of version numbers means that the relationship or binding between the Qualified Name of a component and a language's interpretation requires the use of the version number. That means that general binding tools, such as XML to Java mappings, cannot be used stand-alone.
5 Indicating compatibility of changes or extensions.
As a language designer will have chosen a component version identification strategy, they must also choose how compatible or incompatible changes will be indicated.
5.1 Compatible.
As mentioned in the forwards compatibility section, forwards compatibility requires a substitution mechanism. Ignoring unknown content is a very popular model. It may be specified as the default for any extensions. It could also be specified in an instance where the default is for incompatible versioning. This could be a flag, such as ns:mayIgnore="true" .
5.2 Incompatible.
A version author can use new namespace names, local names, or version numbers to indicate an incompatible change. An extension author may not have these mechanisms available for indicating an incompatible extension. A language designer that wants to allow extension authors to indicate that an extension is incompatible must provide a mechanism for indicating that consumers must understand the extension, and the consumer must generate an error if it does not understand the extension. If only specific consumers must understand the extension, then the language designer must also provide a mechanism for indicating which consumers. If the language designer has allowed for forwards compatibility, then the forwards compatibility rule must be over-ridden.
Languages with forwards compatibility support MAY provide an override for indicating incompatible extensions but should only do so IF the incompatible extensions can be clearly targeted or scoped.
5.2.1 Must Understand flag.
Arguably the simplest and most flexible over-ride of the Must Ignore Unknowns technique is a Must Understand flag that indicates whether the item must be understood. The SOAP, WSDL, and WS-Policy attributes and values for specifying understand are respectively: soap:mustUnderstand=”1”, wsdl:required=”1”, wsp:Usage=”wsp:Required”. SOAP is probably the most common case of a container that provides a Must Understand model. The default value is 0, which is effectively the Must Ignore rule.
A language designer can re-use an existing Must Understand model by constraining their language to an existing Must Understand model. A number of Web services specifications have done this by specifying that the components are SOAP header blocks, which explicitly brings in the SOAP Must Understand model.
A language designer can design a Must Understand model into their language. A Must Understand flag allows the producer to insert extensions into the container and use the Must Understand attribute to over-ride the must Ignore rule. This allows producers to extend instances without changing the extension element’s parent’s namespace, retaining backwards compatibility. Obviously the consumer must be extended to handle new extensions, but there is now a loose coupling between the language’s processing model and the extension’s processing model. A Must Understand flag is provided below:
An example of an instance of a 3rd party indicating that a middle component is an incompatible change:
Specification of a Must Understand flag must be treated carefully as it can be computationally expensive. Typically a processor will either: perform a scan for Must Understand components to ensure it can process the entire text, or incrementally process the instance and is prepared to rollback or undo any processing if an not understood Must Understand is found.
There are other refinements related to Must Understand. One example is providing an element that indicates which extension namespaces must be understood, which avoids the scan of the instance for Must Understand flags.
It is also possible to re-use the SOAP processing model with it's mustUnderstand. Use of a SOAP header for an extension may be because the body was not designed to be extensible, or because the extension is considered semantically separate from the body and will typically be processed differently than the body.
6 XML Schema 1.0.
XML Schema provides a variety of mechanisms for extensibility and versioning: wildcards, type extension, type restriction, redefine, substitution groups, and xsi:type attributes. The wildcard construct enables authors to create schemas that are both forwards and backwards-compatible. Generally, a new schema using wildcards is backwards compatible because it will validate old and new instances. The exception is instances that have content that is legal in the wildcard but not in the new content. An example might be a middle name that has structure or digits. However, that scenario means that an author created a middle name instance in the middle name namespace according to one schema AND an author defined a new middle name in the same namespace according to a different schema. Arguably there is an authority over the namespace that will prevent such clashes and so in practice this exception won't happen. Alternatively, we can make a slightly different compatibility guarantee, which is the new schema is backwards compatible with validate old and new instance where new instances do not have any extensions in the defined namespaces. The old schema is forwards compatible because it will validate old and new instances - of course it sees these as current and future instances.
When an author creates a new version, a new schema can created by the replacement of wildcard(s) in the original, with an optional-element, optional-wildcard sequence, in the later schema. The new schema explicitly states the entire new content model, including everything from the original schema as well as the new explicit declaration for middle, and for that reason we call it a "Complete Respecification" of the type.
A new type declared using wildcards could be declared as an explicit <xs:restriction/> of the original type, because every document accepted by the new type is also accepted by the old. XML Schema's type <xs:restriction/> allows alteration of wildcards anywhere in the content model, like Complete Respecification, but allows the original type to be preserved. Alternatively, XML Schema's type extension mechanism <xs:extension/> provide ref to Recommendation provides a different way of specifiying a modified type, in which the original content is not restated, but only the new elements are explicitly referenced. The differences are: (1) xs:extension allows new content only at the end of the model and (2) using wildcards as shown above, the original type will accept not only documents in the original language, but also documents containing the middle name, something that's not true in typical uses of xs:extension. Thus the schema author of new version of a type has 3 options outlined above: 1) Complete Respecification without explicit use of xs:restriction; 2) Complete Respecification with explicit use of xs:restriction; 3) xs:extension.
These mechanisms can be combined together. For example, a schema that supports new components in existing or new namespaces and supports multiple schema versions (described in ) uses wildcards, type extension, and use of Extension elements in instances.
Given an extensibility point that allows different namespaces, the language designer and 3rd parties can now use different namespaces for their versions. In general, an extension can be defined by a new specification that makes a normative reference to the earlier specification and then defines the new content. No permission should be needed from the authors of the specification to make such an extension. In fact, the major design point of XML namespaces is to allow decentralized extensions. The corollary is that permission is required for extensions in the same namespace. A namespace has an owner; non-owners changing the meaning of something can be harmful.
Attribute extensions can be in any namespace because in XML schema, attributes do not have non-determinism (aka Unique Particle Attribution) constraints that elements do. In XML Schema, the attributes are always unordered and the model group for attributes uses a different mechanism for associating attributes with schema types than the model group for elements. We will discuss this important issue later in the finding.
7 Schemas for Version Identification Strategies.
7.1 #1: all components in new namespace(s) for each version.
Using XML Schema 1.0, the name owner might like to write a schema such as:
The next version of the schema, with middle name added, might look like.
This schema is not perhaps quite what is desired because there are now 2 wildcards in the content model, the original wildcard then the new middle and the new wildcard. Type extension does not replace any existing wildcard trailing wildcard with the additive content. An alternative is to not have the wildcard in the first version but that removes forwards compatible extensibility as both sides must have the new schema to understand the type. Because of the type extension problem, the language designer cannot re-use the existing name definition and force a single wildcard at the end. They must create a new schema without any re-use of the previous schema's type information by respecifying the type. They can simply respecify the type or they can use xsd:restriction. Using xsd:restriction has some extra value in that a Schema processor can guarantee that the content model is a true restriction, but in general, respecification with or without xsd:restriction are equivalent.
The new namespace for all components does not allow compatible evolution by the language designer, unless they choose to put new components in a new namespace, which is strategy #2. Additionally, the version 2 schema cannot re-use the existing type definition.
7.2 #2: all new components in new namespace(s) for each compatible version.
We previously saw how re-use by importing and extending schemas with wildcards is not possible. In this strategy, the schema designer attempts to insert the new extension in the existing schema definition, like:
The Unique Particle Attribution(UPA) constraint of XML Schema, described in more detail in Unique Particule Attribution, prevents this from working. The problem arises in a version when an optional element is followed by a wildcard. In this example, this occurs when an optional element is added and extensibility is still desired. This is an ungentle introduction to the difference between extensibility and versioning. An optional middle name added into a subsequent version is a good example. Consumers should be able to continue processing if they don’t understand an additional optional middle name, and we want to keep the extensibility point in the new version. We can't write a schema that contains the optional middle name and a wildcard for extensibility. The previous schema schema is roughly what is desired using wildcards, but it is illegal because of the UPA constraint.
The author has 5 options for the v2 schema for name and middle, listed below and detailed subsequently:
the new middle is defined, extensibility is retained, and the new name type does not refer to the new middle;
the new middle is defined, extensibility is lost, and the new name type refers to the new middle as optional;
the new middle is defined, extensibility is retained, and the new name type refers to the new middle as required - the result is that compatibility is lost (essentially strategy #1);
the new middle is defined, extensibility is retained, and there is no new name type.
no update to the Schema.
If they leave the middle as optional and retain the extensibility point, the best schema that they can write is:
This is not a very helpful XML Schema change. The problem is that they cannot insert the reference to the optional midns:middle element in the name schema and retain the extensibility point because of the aforementioned Unique Particle Attribution Constraint.
The core of the problem is that there is no mechanism for constraining the content of a wildcard. For example, imagine that ns1 contains foo and bar. It is not possible to take the SOAP schema—an example of a schema with a wildcard - and require that ns1:foo element must be a child of the header element and ns1:bar must not be a child of the header element using just W3C XML Schema constructs. Indeed, the need for this functionality spawned some of the WSDL functionality.
They could decide to lose the extensibility point (option #2), such as.
This does lose the possibility for forwards-compatible evolution.
Option #3 is adding a required middle. They must indicate the change is incompatible. A new namespace name for the name element can be created. This is essentially strategy #1, new namespace for all components.
The downsides of the 3 options for new components in new namespace name(s) design have been described. Additionally, the design can result in specifications and namespaces that are inappropriately factored, as related constructs will be in separate namespaces.
7.2.1 Redefine.
Redefine allows incompatible and incompatible changes to be made to a type. Unlike other schema extension mechanisms which provide new names for extended or restricted types, redefine changes the definition of a type without changing its name. This means that the name alone is no longer sufficient to determine of two types are really the same. The schema author must take some caution to ensure that compatible changes are made. However, there are scenarios where redefine may be the right mechanism. In particular, an extension author may want to create a schema that is based upon a schema that they cannot change. In the previous examples, the middle author cannot change the nameType. However, they cannot use redefine to help them define a schema. Redefine using respecification, restriction, or extension do not allow a component in a new namespace to be added to the end of a sequence and retain the extensibility model. We showed the scenarios of adding the content at the end and the limitations of UPA hold true with and without Redefine. Redefine is usable when the extension author chooses to make an incompatible change (#3) or they can accept losing the extension point (#2).
7.3 #2.5: All new components in existing or new namespace(s) for each compatible version.
It is possible to create Schemas with additional optional components. This requires re-using the namespace name for optional components where possible, and use a new namespace where re-using the namespace is not possible. The re-using namespace rule is:
Re-use namespace names Rule: If a backwards compatible change can be made to a specification, then the old namespace name SHOULD be used in conjunction with XML’s extensibility model.
Strategy #1 uses a new namespace for all existing components and any additions, strategy #2 uses a new namespace for all additions (compatible and incompatible). strategy #3 re-uses namespaces for compatible extensions and uses a new namespace for all incompatible additions. Said slightly differently, strategies #1 and #2 use a new namespace name for any extension and strategy # 3 uses a new namespace only for incompatible change is made.
Earlier examples showed that it is not possible to have a wildcard with ##any (or even ##targetnamespace) following optional elements in the targetnamespace. This strategy is a "middle-ground" strategy, where the ##any is used wherever possible and ##other is used where ##any cannot be used. ##any can be used after mandatory elements or for attributes.
The addition of an optional middle can be done in the same namespace, but the wildcard must change to ##other.
7.4 #3: All new components in existing namespace(s) for each compatible version.
It is possible to create Schemas with additional optional components. This requires re-using the namespace name for optional components and special schema design techniques. The re-using namespace rule is:
Re-use namespace names Rule: If a backwards compatible change can be made to a specification, then the old namespace name SHOULD be used in conjunction with XML’s extensibility model.
Strategy #1 uses a new namespace for all existing components and any additions, strategy #2 uses a new namespace for all additions (compatible and incompatible). strategy #3 re-uses namespaces for compatible extensions and uses a new namespace for all incompatible additions. Said slightly differently, strategies #1 and #2 use a new namespace name for any extension and strategy # 3 uses a new namespace only for incompatible change is made.
New namespaces to break Rule: A new namespace name is used when backwards compatibility is not permitted, that is software SHOULD reject texts if it does not understand the new language components.
Earlier examples showed that it is not possible to have a wildcard with ##any (or even ##targetnamespace) following optional elements in the targetnamespace. The solution to this problem is to introduce an element in the schema that will always appear if the extension appears. The content model of the extensibility point is the element + the extension. There are two styles for this. The first, which we will call Extension element style, was published in an earlier version of this Finding in December 2003. It uses an Extensibility element with the extensions nested inside. The second, which we weill call Sentry style, was published in July 2004, then updated on MSDN. It uses a Sentry or Marker element with extensions following it.
A name type with extension elements is.
Because each extension in the targetnamespace is inside an Extension element, each subsequent target namespace extensions will increase nesting by another layer. While this layer of nesting per extension is not desirable, it is what can be accomplished today when applying strict XML Schema validation. It seems to at least this author that potentially having multiple nested elements is worthwhile if multiple compatible revisions can be made to a language. This technique allows validation of extensions in the targetnamespace and retaining validation of the targetnamespace itself.
The previous schema allows the following sample namens:
The namespace author can create a schema for this type.
The advantage of this design technique is that a forwards and backwards compatible Schema V2 can be written. The V2 schema can validate documents with or without the middle, and the V1 schema can validate documents with or without the middle. This is the only schema design that enables all versions of the language to have complete schemas.
Further, the re-use of the same namespace has better tooling support. Many applications use a single schema to create the equivalent programming constructs. These tools often work best with single namespace support for the “generated” constructs. The re-use of the namespace name allows at least the namespace author to make changes to the namespace and perform validation of the extensions.
An obvious downside of this approach is the complexity of the schema design. Another downside is that changes are linear, so 2 potentially parallel extensions must be nested rather than parallel.
7.4.1 Redefine.
The author could use redefine to add the middle in the same namespace. However, the first version does not allow extensions in the same namespace, so this is an incompatible change.
In the previous example, the author of the redefined schema replaced the type with an update. If the author of the nameType wanted to make the change, they could presumably just change the type without using redefine. In cases where the author of the extension is not the author of the base type, then redefine allows them to change the type. Some people may consider this an illegal redefinition of the nameType because they believe that only the namespace owner of the nameType should make changes to the type. Redefine also allows extension and restriction, subject to the limitations of them. Redefine does not help the nameType owner or an extension author create a revised type that refers to any new construct.
7.5 #4: all new components in existing or new namespace(s) for each version and a version identifier.
Using a version identifier, the name instances would change to show the version of the name they use, such as:
The last example shows a middle that is a mandatory part of the name, which is indicated by the use of a new major version number. As with Design #2, the schema for the optional middle cannot fully express the content model. A schema for the mandatory middle is.
A significant downside with using version identifiers is that software that supports both versions of the name must perform special processing on top of XML and namespaces. For example, many components “bind” XML types into particular programming language types. Custom software must process the version attribute before using any of the “binding” software. In Web services, toolkits often take SOAP body content, parse it into types and invoke methods on the types. There are rarely “hooks” for the custom code to intercept processing between the “SOAP” processing and the “name” processing. Further, if version attributes are used by any 3rd party extensions—say midns:middle has a version—then the schema cannot refer to the correct middle.
7.6 #5: all components in existing namespace(s) for each version and a version identifier.
Using a version identifier, the name instances would change to show the version of the name they use, such as:
This is not a very helpful XML Schema change. The problem is that they cannot insert the reference to the optional midns:middle element in the name schema and retain the extensibility point because of the aforementioned Unique Particle Attribution Constraint.
The last example shows that the middle is now a mandatory part of the name. As with Design #2, the schema for the optional middle cannot fully express the content model. A schema for the mandatory middle is.
This design has the significant drawback that XML Schema cannot be used for many of the changes. Because the same namespace is used for all versions of the language, then the wildcard namespace attribute must contain ##any . This means that any changes that are compatible, such as the addition of an optional middle in the 2nd example, cannot be completely modeled in XML Schema.
8 Indicating Incompatible changes.
A new qualified name can be created by specifying standalone content, respecifying existing content or by some kind of relationship with existing content. A variety of compatible extension mechanisms have been shown. There are more mechanism for incompatible changes in Schema 1.0.
8.1 Type extension.
A common option for indicating an incompatible change is to use type extension. The language designer allows for type extension, and they must specify that type extensions must be understood. Strategy #1 (all components in new namespace) shows a type extension schemas.
8.2 Substitution Groups.
Another mechanism for extending a type in XML Schema is substitution groups. Substitution groups enable an element to be declared as substitutable for another. This can only be used for incompatible extensions as the consumer must understand the new element and the schema that contains the substitution type. Substitution groups require that elements are available for substitution, so the name designer must have provided a name element in addition to the name type.
A schema for a substitution group is provided below:
Substitution groups do allow a single extension author to indicate that their changes are mandatory. The limitations are that the extension author has now taken over the type’s extensibility. A visual way of imagining this is that the type tree has now been moved from the language designer over to the extensions author. And the language designer probably does not want their type to be “hijacked”.
However, this is not substantially different than an extension being marked with a “Must Understand”. In either case—with the extensions higher up in the tree (sometimes called top-typing) or lower in the tree (bottom-typing)—a new type is effectively created.
The difference is that there can only be 1 element at the top of an element hierarchy. If multiple mandatory extensions are added, then the only way to compose them together is at the bottom of the type because that is where the extensibility is.
Substitution groups do not allow a language designer and an extension author to incompatibly change the language as they end up conflicting over what to call the name element. Thus substitution groups are a poor mechanism for allowing an extension author to indicate that their changes are incompatible. A Must Understand flag is a superior method because it allows multiple extension authors to mix their mandatory extensions with a language designer’s versioning strategy. Hence language designers should prevent substitution groups and provide a Must Understand flag or other model when they wish to allow 3rd parties to make incompatible changes.
In some cases, a language does not provide a Must Understand mechanism. In the absence of a Must Understand model, the only way to force consumers to reject a message if they don’t understand the extension namespace is to change the namespace name of the root element, but this is rarely desirable.
8.3 Must Understand.
Each of the various component identification schemes can support a mustUnderstand flag. Two schema for a Must Understand flag are provided below:
An example of an instance of a 3rd party indicating that a middle component is an incompatible change:
9 Survey of Languages Versioning Strategies.
We can examine a variety of languages for their versioning strategies.
WS-Policy 1.5 uses ##any for attributes and PolicyReference element extensibility, and ##other for other extensibility points.
SVG specifies an extension entity for extending most of the SVG elements, it specifies that fallbacks can be provided for unknown elements and that processing can be aborted.
UBL extensions are incompatible by UBL definition.
XSLT 2.0 has very powerful versioning features. The version of the processor can be tested, fallbacks can be provided for unknown elements and processing can be aborted.
10 Unique Particle Attribution.
This Finding has spent considerable material describing content models valid under Unique Particle Attribution constraints, and so it is worth describing the W3C XML Schema Unique Particle Attribution constraint in more detail. The reader is reminded that these rules are unique to W3C XML Schema and that other XML Schema languages like RELAX NG do not use these rules and so do not suffer from the contortions one is forced through when using W3C XML Schema. XML DTDs and W3C XML Schema have a rule that requires schemas to have content models valid under the Unique Particle Attribution constraint. From the XML 1.0 specification,
“For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the XML processor cannot know which b in the model is being matched without looking ahead to see which element follows the b.”
The use of ##any means there are some schemas that we might like to express, but that aren’t allowed.
Wildcards with ##any, where minOccurs does not equal maxOccurs, are not allowed before an element declaration. An instance of the element would be valid for the ##any or the element. ##other could be used.
The element before a wildcard with ##any must have cardinality of maxOccurs equals its minOccurs. If these were different, say minOccurs=”1” and maxOccurs=”2”, then the optional occurrences could match either the element definition or the ##any. As a result of this rule, the minOccurs must be greater than zero.
Derived types that add element definitions after a wildcard with ##any must be avoided. A derived type might add an element definition after the wildcard, then an instance of the added element definition could match either the wildcard or the derived element definition.
Follow Unique Particle Attribution constraint: Use of wildcards MUST be follow the Unique Particle Attribution constraint. Location of wildcards, namespace of wildcard extensions, minOccurs and maxOccurs values are constrained, and type restriction is controlled.
As shown earlier, a common design pattern is to provide an extensibility point—not an element - allowing any namespace at the end of a type. This is typically done with <xs:any namespace=”##any”>.
Unique Particle Attribution makes this unworkable as a complete solution in many cases. Firstly, the extensibility point can only occur after required elements in the original schema, limiting the scope of extensibility in the original schema. Secondly, backwards compatible changes require that the added element is optional, which means a minOccurs=”0”. The Unique Particle Attribution constraint prevents us from placing a minOccurs=”0” before an extensibility point of ##any. Thus, when adding an element at an extensibility point, the author can make the element optional and lose the extensibility point, or the author can make the element required and lose backwards compatibility.
11 Other technologies.
The W3C XML Schema Working has heard and taken to heart many of these concerns. They have plans to remedy some of these issues in XML Schema 1.1. A Working Draft[] and a Guide 2 versioning using the new XML Schema 1.1 features [] are available.
A simple analysis of doing compatible extensibility and versioning using RDF and OWL is available [21]. In general, RDF and OWL offer superior mechanisms for extensibility and versioning. RDF and OWL explicitly allow extension components to be added to components. And further, the RDF and OWL model builds in the notion of “Must Ignore Unknowns” as an RDF/OWL processor will absorb the extra components but do nothing with them. An extension author can require that consumers understand the extension by changing the type using a type extension mechanism.
RELAX NG is another schema language. It explicitly allows extension components to be added to other components as it does not have the Unique Particle Attribution constraint.
12 Conclusion.
This Finding describes a number of questions, decisions and rules for using XML, W3C XML Schema, and XML Namespaces in language construction and extension. The main goal of the set of rules is to allow language designers to know their options for language design, and ideally make backwards - and forwards-compatible changes to their languages to achieve loose coupling between systems.
13 References.
14 Acknowledgements.
The author thanks the many reviewers that have contributed to the article, particularly David Bau, William Cox, Ed Dumbill, Chris Ferris, Yaron Goland, Hal Lockhart, Mark Nottingham, Jeffrey Schlimmer, Cliff Schmidt, and Norman Walsh.

No comments:

Post a Comment