2009年12月3日木曜日

(Code: c) urie

URIエンコーディングを意図したコードです。
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BLOCK 1024*8


int main(int argc, char *argv[])
{
int n;
char buf[BLOCK + 1];
unsigned char *str = NULL;
int size_str = 0;

/* read stdin */
while (1) {
n = fread(buf ,1 , BLOCK, stdin);
if (n == 0) {
str = realloc(str, size_str + 1);
break;
} else if (n < 0) {
perror("fread");
return -1;
}
str = realloc(str, size_str + n);
memcpy(str + size_str, buf, n);
size_str += n;
}
*(str + size_str) = '\0';

/* uri encoding */
while (*str) {
printf("%%%02X", *str);
str++;
}

return 0;
}
実用には問題ないけどアスキーな文字までエンコードしてしまう。GLibのg_uri_escape_stringとか使えば、その辺キッチリできそう。
RFC読まなきゃ…と思って単純に"Uniform Resource Identifier"で検索したら20文献多すぎ挫折しました。


URIエンコーディング関連? RFC

2079 Definition of an X.500 Attribute Type and an Object Class to Hold
Uniform Resource Identifiers (URIs). M. Smith. January 1997. (Format:
TXT=8757 bytes) (Status: PROPOSED STANDARD)

2168 Resolution of Uniform Resource Identifiers using the Domain Name
System. R. Daniel, M. Mealling. June 1997. (Format: TXT=46528 bytes)
(Obsoleted by RFC3401, RFC3402, RFC3403, RFC3404) (Updated by
RFC2915) (Status: EXPERIMENTAL)

2396 Uniform Resource Identifiers (URI): Generic Syntax. T.
Berners-Lee, R. Fielding, L. Masinter. August 1998. (Format:
TXT=83639 bytes) (Obsoleted by RFC3986) (Updates RFC1808, RFC1738)
(Updated by RFC2732) (Status: DRAFT STANDARD)

2838 Uniform Resource Identifiers for Television Broadcasts. D.
Zigmond, M. Vickers. May 2000. (Format: TXT=11405 bytes) (Status:
INFORMATIONAL)

3305 Report from the Joint W3C/IETF URI Planning Interest Group:
Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names
(URNs): Clarifications and Recommendations. M. Mealling, Ed., R.
Denenberg, Ed.. August 2002. (Format: TXT=21793 bytes) (Status:
INFORMATIONAL)

3404 Dynamic Delegation Discovery System (DDDS) Part Four: The Uniform
Resource Identifiers (URI). M. Mealling. October 2002. (Format:
TXT=40124 bytes) (Obsoletes RFC2915, RFC2168) (Status: PROPOSED
STANDARD)

3617 Uniform Resource Identifier (URI) Scheme and Applicability
Statement for the Trivial File Transfer Protocol (TFTP). E. Lear.
October 2003. (Format: TXT=11848 bytes) (Status: INFORMATIONAL)

3761 The E.164 to Uniform Resource Identifiers (URI) Dynamic
Delegation Discovery System (DDDS) Application (ENUM). P. Faltstrom,
M. Mealling. April 2004. (Format: TXT=41559 bytes) (Obsoletes
RFC2916) (Status: PROPOSED STANDARD)

3969 The Internet Assigned Number Authority (IANA) Uniform Resource
Identifier (URI) Parameter Registry for the Session Initiation
Protocol (SIP). G. Camarillo. December 2004. (Format: TXT=12119
bytes) (Updates RFC3427) (Also BCP0099) (Status: BEST CURRENT
PRACTICE)

3986 Uniform Resource Identifier (URI): Generic Syntax. T.
Berners-Lee, R. Fielding, L. Masinter. January 2005. (Format:
TXT=141811 bytes) (Obsoletes RFC2732, RFC2396, RFC1808) (Updates
RFC1738) (Also STD0066) (Status: STANDARD)

4051 Additional XML Security Uniform Resource Identifiers (URIs). D.
Eastlake 3rd. April 2005. (Format: TXT=33368 bytes) (Status: PROPOSED
STANDARD)

4088 Uniform Resource Identifier (URI) Scheme for the Simple Network
Management Protocol (SNMP). D. Black, K. McCloghrie, J.
Schoenwaelder. June 2005. (Format: TXT=43019 bytes) (Status: PROPOSED
STANDARD)

4501 Domain Name System Uniform Resource Identifiers. S. Josefsson.
May 2006. (Format: TXT=20990 bytes) (Status: PROPOSED STANDARD)

4622 Internationalized Resource Identifiers (IRIs) and Uniform
Resource Identifiers (URIs) for the Extensible Messaging and Presence
Protocol (XMPP). P. Saint-Andre. July 2006. (Format: TXT=49968 bytes)
(Obsoleted by RFC5122) (Status: PROPOSED STANDARD)

4904 Representing Trunk Groups in tel/sip Uniform Resource Identifiers
(URIs). V. Gurbani, C. Jennings. June 2007. (Format: TXT=41027 bytes)
(Status: PROPOSED STANDARD)

4967 Dial String Parameter for the Session Initiation Protocol Uniform
Resource Identifier. B. Rosen. July 2007. (Format: TXT=12659 bytes)
(Status: PROPOSED STANDARD)

5017 MIB Textual Conventions for Uniform Resource Identifiers (URIs).
D. McWalter, Ed.. September 2007. (Format: TXT=14826 bytes) (Status:
PROPOSED STANDARD)

5122 Internationalized Resource Identifiers (IRIs) and Uniform
Resource Identifiers (URIs) for the Extensible Messaging and Presence
Protocol (XMPP). P. Saint-Andre. February 2008. (Format: TXT=55566
bytes) (Obsoletes RFC4622) (Status: PROPOSED STANDARD)

5341 The Internet Assigned Number Authority (IANA) tel Uniform
Resource Identifier (URI) Parameter Registry. C. Jennings, V.
Gurbani. September 2008. (Format: TXT=13944 bytes) (Updates RFC3966)
(Status: PROPOSED STANDARD)

5527 Combined User and Infrastructure ENUM in the e164.arpa Tree. M.
Haberler, O. Lendl, R. Stastny. May 2009. (Format: TXT=20733 bytes)
(Status: INFORMATIONAL)


環境

OS: Linux
debian-lenny